An influence-based fast preceding questionnaire model for elderly assessments

Abstract

To improve the efficiency of elderly assessments, an influence-based fast preceding questionnaire model (FPQM) is proposed. Compared with traditional assessments, the FPQM optimizes questionnaires by reordering their attributes. The values of low-ranking attributes can be predicted by the values of the high-ranking attributes. Therefore, the number of attributes can be reduced without redesigning the questionnaires. A new function for calculating the influence of the attributes is proposed based on probability theory. Reordering and reducing algorithms are given based on the attributes’ influences. The model is verified through a practical application. The practice in an elderly-care company shows that the FPQM can reduce the number of attributes by 90.56% with a prediction accuracy of 98.39%. Compared with other methods, such as the Expert Knowledge, Rough Set and C4.5 methods, the FPQM achieves the best performance. In addition, the FPQM can also be applied to other questionnaires.

Keywords

Questionnaire reorder reduce fast preceding questionnaire model elderly assessment

1. Introduction

Questionnaires have been widely used in various fields, including elderly assessments. Several questionnaires have been developed and are currently in extensive use to assess health-related quality of life (HRQOL)[20]. Aging is an increasingly serious social phenomenon in China, and there is a strong need for care services. Assessments of the elderly are essential for providing personalized services. Existing assessment methods are usually based on the Barthel Index[18] and the national industry standard for the ability assessment of elderly adults[28]. Many investigation attributes are needed to systematically obtain information about the elderly. The elderly are asked about multiple attributes in succession. These assessment methods are inefficient, and the order of the attributes is not reasonable. When there is a relationship between attributes, some unknown attributes can be predicted by known attributes, and a more reasonable order should be determined[8, 9, 19].

Classical Test Theory (CTT), Rasch Analysis (RA), decision rule, and experts[4, 10, 15, 21, 22, 25] have been applied to reduce the length of health questionnaires. However, actually, these removed attributes have provided additional information. A reasonable order of these attributes of the questionnaires can also be considered. Correlation, multiple regression, factor analysis, cluster analysis and structural equation modelling, and hierarchical multiple regression [2, 7, 24, 27] can be used to determine the relationships among the attributes of health questionnaires. Certain attributes can indeed be predicted by other attributes using hierarchical logistic regression, correlation analysis, and binary logistic stepwise regression[1, 3, 6, 11, 14]. However, only one attribute, not multiple attributes simultaneously, could be predicted in one study. Meanwhile, the involved attributes in each study are incomplete.

A solid mathematical definition of the question is given. The fast preceding questionnaire model (FPQM) is proposed to solve the problem in five steps. First, the influence of one attribute on all other attributes is defined and calculated. Second, we traverse every investigation attribute and chooses the attribute with the largest influence as the best attribute to split. Third, we create the FPQM with the best attribute. We traverse every value of the attribute, and the sub-dataset corresponding to the value can be used to obtain the sub-model recursively. Then, the sub-model is attached to the full model, and the full model is obtained when the recursion ends. Fourth, the created FPQM is used for the real investigation. The value is directly asked for at the beginning of the real investigation because there is no prior information about the respondent. Certain investigation attributes can be inferred after sufficient information has been accumulated. At that time, the confidence level is greater than the given threshold, which means that questions about the attribute do not need to be asked. Fifth, we calculate the evaluation metrics and evaluate the model FPQM.

This paper is organized as follows. Section 2 reviews related work. The fast preceding questionnaire model (FPQM) is introduced in Section 3. First, a solid mathematical definition of the question is given. Then, an influence calculation formula, the best attribute to split choosing algorithm (BASCA), the fast preceding questionnaire model creating algorithm (FPQMCA), the model used for real investigation algorithm (MURIA), and the model evaluation algorithm (MEA) are presented. Section 4 shows the experimental results, therein presenting the experimental data;, evaluation metrics; the overall results of the FPQM; the comparison experiment with Expert Knowledge, Rough Set, and C4.5; and the factor analysis, which includes the number of elderly, number of investigation attributes, and threshold. Section 5 concludes the paper.

2. Related work

2.1 Attribute reduction

Prieto et al.[22] presents a parallel reduction in a 38-attribute questionnaire, the Nottingham Health Profile (NHP), to empirically compare Classical Test Theory (CTT) and Rasch Analysis (RA) results. The CTT results in 20 attributes (4 dimensions), whereas RA results in 22 attributes (2 dimensions). Moreover, the attribute-total correlation ranges from 0.45–0.75 for NHP20 and from 0.46–0.68 for NHP22, while the reliability ranges from 0.82–0.93 and from 0.87–94, respectively.

Fernandez and Boyle [10] reduces and reorganizes the McGill Pain Questionnaire (MPQ) using a 3-step decision rule for affective and evaluative descriptors of Pain. With a minimum absolute frequency of 17 and a minimum relative frequency of 1/2 as the threshold values, the words of the MPQ are reduced from 78 to less than 20 on average. This reduction leads to a negligible loss of information transmitted. Moreover, Kitisomprayoonkul et al. [15] develops the Thai Short-Form McGill Pain Questionnaire (Th-SFMPQ).

Rosen et al. [25] develops an abridged five-attribute version (IIEF-5) of the 15-attribute International Index of Erectile Function (IIEF) to diagnose the presence and severity of erectile dysfunction (ED). The five attributes are selected based on the ability to identify the presence or absence of ED and on adherence to the National Institute of Health’s definition of ED. The IIEF-5 possesses favorable properties for detecting the presence and severity of ED.

Badia et al. [4] achieves a qualitative and quantitative reduction in the 179 expressions of the bone metastasis quality of life questionnaire (BOMET-QOL) with respect to clarity, frequency and importance with 15 experts. This phase, which is performed in two steps, results in the 35-attribute version of the BOMET-QOL. The initial reduction yields a 25-attribute questionnaire via factorial analysis. Similarly, the BOMET-QOL-25 is reduced to an integrated version of 10 attributes through a sample of 263 oncology patients. The BOMET-QOL is an accurate, reliable and precise 10-attribute instrument for assessing HRQOL.

Nijsten et al. [21] tests and reduces Skindex-29 to Skindex-17 using Rasch Analysis. The Rasch Analysis of the combined emotion and social functioning subscale of Skindex-29 results in a 12-attribute psychosocial subscale. A total of five of the seven attributes are retained in a symptom subscale. Classical psychometric properties, such as the response distribution, attribute-rest correlation, attribute complexity, and internal consistency, of the two subscales of Skindex-17 are at least adequate. Skindex-17 is a Rasch-reduced version of Skindex-29, with two independent scores that can be used for the measurement of health-related quality of life (HRQOL) for dermatological patients.

[4, 10, 15, 21, 22, 25] remove some attributes directly and develop qualitative and quantitative reductions in questionnaires about health using Classical Test Theory (CTT), Rasch Analysis (RA), decision rules, or experts. These questionnaires include the Nottingham Health Profile (NHP), McGill Pain Questionnaire (MPQ), 15-attribute International Index of Erectile Function (IIEF), bone metastasis quality of life questionnaire (BOMET-QOL), and Skindex-29. However, these removed attributes can provide additional information, and their values can be predicted by the remaining attributes with reduction methods. Meanwhile, a more reasonable order of these attributes is not considered.

2.2 Relationships among attributes

Dima [7] studie the interrelations between acceptance, emotions, illness perceptions and health status. The confirmatory analysis (employing a variety of statistical procedures, from correlation to multiple regression, factor analysis, cluster analysis and structural equation modelling) largely confirms the expected relations within and between domains and is also informative regarding the most suitable data reduction methods. An additional exploratory analysis focuses on identifying the comparative characteristics of acceptance, emotions, and illness perceptions in predicting health status metrics.

Arnow et al. [2] provide estimates of the prevalence and strength of association between major depression and chronic pain in a primary care population and examines the clinical burden associated with the two conditions alone and together. Data are collected by questionnaires assessing major depressive disorder (MDD), chronic pain, pain-related disability, somatic symptom severity, panic disorder, other anxiety, probable alcohol abuse, and health-related quality of life (HRQL). The instruments include the Patient Health Questionnaire, SF-8, and the Graded Chronic Pain Questionnaire. The conclusions are that chronic pain is common among those with MDD, and Comorbid MDD and disabling chronic pain are associated with greater clinical burden than is MDD alone.

Rippentrop et al. [24] seek to better understand the relationships among religion/spirituality and physical health, mental health, and pain in 122 patients with chronic musculoskeletal pain. Hierarchical multiple regression analyzes reveal significant associations between components of religion/spirituality and physical and mental health. Forgiveness, negative religious coping, daily spiritual experiences, religious support, and self-rankings of religious/spiritual intensity significantly predict mental health status. Religion/spirituality is unrelated to pain intensity and life interference due to pain. Religion/spirituality may have both costs and benefits for the health of those with chronic pain.

Vines et al. [27] determine the relationships between pain perceptions, immune function, depression and health behaviors and examines the effects of chronic pain on immune function using depression and health behaviors as covariates. Pain perceptions show positive significant correlations with depression ( $P=$ 0.01) and total percent of NK cells ( $P=$ 0.04). Depression and health behaviors are negatively correlated ( $P=$ 0.01). Positive associations are observed for depression and 2 PHA mitogen levels ( $P<$ 0.05). The immune function of patients with chronic pain is significantly higher than in the no-pain comparison group. Pain perceptions may have a deleterious effect on enumerative NK cell measures and depression levels.

Most of the attributes mentioned in [2, 7, 24, 27] are included in Table 4, such as acceptance, emotions, illness perceptions, and health status; depression, chronic pain, and clinical burden; religion/spirituality and physical health, mental health, and pain; and pain perceptions, immune function, depression and health behaviors. The only two differences between the investigation attributes in Table 4 and the attributes mentioned in the literature are the expressions. The attributes mentioned in the literature are more conceptual. Applied methods include correlation, multiple regression, factor analysis, cluster analysis, structural equation modelling, and hierarchical multiple regression. The literature proves that relationships among these attributes do exist. However, the attributes covered by the relationships in each study are incomplete, and the relationships have not been well utilized to provide results of interest such as in prediction.

2.3 Prediction

Kersh et al. [14] use psychosocial and health status variables independently to predict health care seeking for fibromyalgia. Subjects are administered 14 measures, which produce six domains of variables: background demographics and pain duration; psychiatric morbidity; and personality, environmental, cognitive, and health status factors. These domains are input into 4 different hierarchical logistic regression analyzes to predict the status as patient or non-patient. The full regression model is statistically significant ( $P<$ 0.0001) and correctly identifies 90.7% of the subjects, with a sensitivity of 92.4% and a specificity of 87.2%.

Aoyama et al. [1] use physical and functional factors in activities of daily living to predict falls in community-dwelling older women. Correlation analysis investigating associations among the scores of assessment scales and actual measurements of muscle strength and balance shows that there are significant correlations between handgrip strength and the Falls Efficacy Scale, Functional Reach test, Timed Up and Go test, Berg Balance Scale, Motor Fitness Scale, and Motor Functional Independence Measure in fallers and non-fallers. A binary logistic stepwise regression analysis reveals that only an inability of “being able to go up and down the staircase” in the Motor Fitness Scale remains a significant variable to predict falls.

Aydeniz et al. [3] predict falls in the elderly with physical, functional and sociocultural parameters. Falls are common in patients with weakness, fatigue, dizziness, and swelling in the legs and in subjects with appetite loss. Fallers have lower functional status than do non-fallers ( $p=$ 0.028). In addition, fallers have more depressive symptoms than do non-fallers ( $p=$ 0.019). Quality of life (NHP), especially physical activity, energy level and emotional reaction, subgroups are different ( $p=$ 0.016, 0.015, and 0.005, respectively). Disability and mental status are similar in groups ( $p=$ 0.006). Musculoskeletal problems, functional status and social status might be contributors to falls.

Gatz et al. [11] use depressive symptoms to predict Alzheimer’s disease and dementia. The Total Center for Epidemiologic Studies Depression (CES-D) score is a significant predictor of AD and dementia when categorized as a dichotomous variable according to the cutoff scores of 16 and 17; a CES-D cutoff of 21 is a significant predictor of AD and a marginally significant predictor of dementia. When analyzed as a continuous variable, the CES-D score is marginally predictive of AD and dementia. Neither participant-reported history of depression nor participant-reported duration of depression is significant in predicting AD or dementia.

Cruice et al. [6] predict social participation in older adults with personal factors, communication and vision. Assessments are individually conducted in a face-to-face interview situation with the primary researcher, who is a speech pathologist. Social participation is shown to be associated with vision, communication activities, age, education and emotional health. Naming and hearing impairments are not reliable predictors of social participation. It is concluded that professionals interested in maintaining and improving the social participation of older people should strongly consider these predictors in community-directed interventions.

Most of the attributes mentioned in [1, 3, 6, 11, 14] are also included in Table 4 such as psychosocial and health status variables and health care; physical, functional and sociocultural parameters and falls; depressive symptoms, Alzheimer’s disease and dementia; and personal factors, communication and vision, and social participation. Health care, falls, Alzheimer’s disease, dementia, and social participation are predicted using hierarchical logistic regression, correlation analysis, and binary logistic stepwise regression, respectively. The literature here is illustrative of the fact that certain attributes can indeed be predicted by other attributes. However, one study only predicts one attribute, not multiple attributes simultaneously. Meanwhile, the involved attributes in each study are also incomplete. A sufficient prediction between complete attributes can be studied.

The relation and motivation between all the related works and this research are emphasized and explained here. [4, 10, 15, 21, 22, 25] show that certain attributes in Table 4 can be reduced directly. The information of these attributes is redundant and contained in other attributes. [2, 7, 24, 27] provide further evidence that there is an inherent relationship between these attributes. [1, 3, 6, 11, 14] further show that certain attributes can be predicted by other attributes because of the underlying relationship. All the related works form the foundation of this research. The proposed fast preceding questionnaire model (FPQM) can achieve state-of-the-art performance only when there is an inherent relationship between attributes. If the relationship does not exist at all, no methods can predict certain attributes by others. The relationship is the foundation of all possible methods, including the FPQM. [4, 10, 15, 21, 22, 25] reduce some attributes directly, while the FPQM predicts the values of the attributes and preserves these attributes. In addition, In [1, 3, 6, 11, 14], one study only predicts one attribute, while the FPQM can predict multiple attributes simultaneously.

3. Fast preceding questionnaire model (FPQM)

A solid mathematical definition of question is given, and the fast preceding questionnaire model (FPQM) is proposed to solve the problem in five steps.

Step 1: Calculate the influence

We calculate, in order, the confidence level of the attribute by taking a value under the condition of another attribute taking a value, the influence of the attribute taking a value on another attribute, the influence of the attribute on another attribute, the influence of the attribute on all other attributes, and the attribute that has the largest influence on all other attributes.

Step 2: Choose the best attribute to split

We traverse every investigation attribute, every other attribute, every value of the attribute, and every value of the other attribute to calculate every influence. Finally, the influence of the investigation attribute on all other attributes can be calculated. Then, we logically choose the best attribute that has the largest influence on all other attributes.

Step 3: Create the FPQM

After the best attribute to split is chosen, we traverse every value of the attribute, and the sub-model can be obtained with sub-dataset corresponding to the value recursively. Then, we attach the sub-model to the full model, and the full model is obtained when the recursion ends.

Step 4: Use the FPQM for real investigation

Now, the FPQM can be used to investigate a new respondent. At the beginning of the real investigation, there is no prior information about the respondent; therefore, we ask for the value directly. After sufficient information has been accumulated, some investigation attributes can be inferred. If the confidence level is larger than the given threshold, then the attribute does not need to be asked about.

Step 5: Evaluate the model

After the FPQM is used to investigate the new respondent, the evaluation metrics can be calculated. The FPQM can be evaluated based on these metrics.

Figure 1.

Illustration of the five steps.

The five steps are also presented in the form of a graph, as shown in Fig. 1. Step 1: Calculate the influence with Eqs (1)–(5). Step 2: Choose the best attribute to split with Algorithm 3.3 and the calculated influence in Step 1. Step 3: Create the FPQM with Algorithm 3.4, and at every recursion step, call Algorithm 3.3 to choose the best attribute to split. Step 4: Use the FPQM for the real investigation with Algorithm 3.5 after the FPQM is created in Step 3. Step 5: Evaluate the model with Algorithm 3.6.

3.1 Problem definition

.

Let $U=\{U_{1},U_{2},\ldots,U_{m}\}$ be the collection of all individuals who are investigated in the training dataset, where $U_{i}(1\leqslant i\leqslant m)$ is the $i$ -th individual being investigated.

.

Let $\tilde{U}=\{\tilde{U_{1}},\tilde{U_{2}},\ldots,\tilde{U_{\tilde{m}}}\}$ be the collection of all individuals in the testing dataset, where $\tilde{U_{i}}(1\leqslant i\leqslant\tilde{m})$ is the $i$ -th individual and $\tilde{U}\cap U=\phi$ .

.

Let $V=\{V_{1},V_{2},\ldots,V_{n}\}$ be the collection of investigation attributes, where $V_{j}$ is the $j$ -th investigation attribute.

.

Let $W=\{W_{1},W_{2},\ldots,W_{n}\}$ be the collection of all possible values on all investigation attributes, where $W_{j}$ is the collection of all possible values on $V_{j}$ .

.

$N=\{N_{1},N_{2},\ldots,N_{n}\}$ , where $N_{j}$ is the number of $W_{j}$ . $N$ will be used in the time complexity analysis of the following four algorithms.

.

Let $D=\{D_{ij}\}(1\leqslant i\leqslant m,1\leqslant j\leqslant n)$ be the matrix of all real values of all individuals, and $D$ is the training dataset, where $D_{ij}$ is the real value of individual $U_{i}$ on the investigation attribute $V_{j}$ and $D_{ij}\in W_{j}$ .

.

Let $R=\{R_{ij}\}(1\leqslant i\leqslant\tilde{m},1\leqslant j\leqslant n)$ be the matrix of all real values, where $R$ is the testing dataset, in which $R_{ij}$ is the real value of $\tilde{U_{i}}$ on $V_{j}$ and $R_{ij}\in W_{j}$ . $R$ can also be represented as $R=\{R_{i}\}(1\leqslant i\leqslant\tilde{m})$ ; $R_{i}=\{R_{ij}\}(1\leqslant j\leqslant n)$ , where $R_{i}$ is the vector of real values of $\tilde{U_{i}}$ .

.

Let $R^{\prime}=\{R_{ij}^{\prime}\}(1\leqslant i\leqslant\tilde{m},1\leqslant j% \leqslant n)$ be the matrix of all final values, where $R_{ij}^{\prime}$ is the final value of $\tilde{U_{i}}$ on $V_{j}$ and $R_{ij}^{\prime}\in W_{j}$ . $R^{\prime}$ can also be represented as $R^{\prime}=\{R_{ij}^{\prime}\}(1\leqslant i\leqslant\tilde{m})$ ; $R_{i}^{\prime}=\{R_{ij}^{\prime}\}(1\leqslant j\leqslant n)$ , where $R_{i}^{\prime}$ is the vector of the final values of $\tilde{U_{i}}$ .

.

Let $K=\{K_{ij}\}(1\leqslant i\leqslant\tilde{m},1\leqslant j\leqslant n)$ be the matrix of indication values showing whether the final value $R_{ij}^{\prime}$ is equal to the real value $R_{ij}$ . $K_{ij}\in\{0,1\}$ , where $K_{ij}=1$ when $R_{ij}^{\prime}=R_{ij}$ and $K_{ij}=0$ when $R_{ij}^{\prime}\neq R_{ij}$ .

.

Let $P=\{P_{ij}\}(1\leqslant i\leqslant\tilde{m},1\leqslant j\leqslant n)$ be the matrix of the confidence levels of $\tilde{U_{i}}$ taking the final values $R_{ij}^{\prime}$ . $P_{ij}\in[0,1]$ , where $P_{ij}$ is the confidence level of $\tilde{U_{i}}$ taking $R_{ij}^{\prime}$ on $V_{j}$ . $P$ can also be represented as $P=\{P_{i}\}(1\leqslant i\leqslant\tilde{m})$ ; $P_{i}=\{P_{ij}\}(1\leqslant j\leqslant n)$ , where $P_{i}$ is the vector of the confidence levels of $\tilde{U_{i}}$ .

.

Let $I=\{I_{ij}\}(1\leqslant i\leqslant\tilde{m},1\leqslant j\leqslant n)$ be the matrix of indication values showing whether the value $R_{ij}^{\prime}$ is predicted from other already known attributes. $I_{ij}\in\{0,1\}$ , where $I_{ij}=1$ denotes that $R_{ij}^{\prime}$ is predicted and $I_{ij}=0$ denotes that $R_{ij}^{\prime}=R_{ij}$ by asking $\tilde{U_{i}}$ on $V_{j}$ directly. Note that $I_{ij}=1$ when $P_{ij}\geqslant\sigma$ , and $I_{ij}=0$ when $P_{ij}<\sigma$ , where $\sigma$ is the given threshold. $I$ can also be represented as $I=\{I_{i}\}(1\leqslant i\leqslant\tilde{m})$ ; $I_{i}=\{I_{ij}\}(1\leqslant j\leqslant n)$ , where $I_{i}$ is the vector of the indication values of $\tilde{U_{i}}$ .

.

Let $O=\{O_{1},O_{2},\ldots,O_{\tilde{m}}\}$ be the collection of all reasonable orders in which individuals should be investigated, where $O_{i}$ is a reasonable order in which $\tilde{U_{i}}$ should be investigated. $O_{i}=\{j_{1},j_{2},\ldots,j_{n}\}$ is a single substitution of $\{1,2,\ldots,n\}$ . $O_{i_{1}}$ is different from $O_{i_{2}}$ (which it most likely is) when $i_{1}$ is different from $i_{2}$ .

.

Let $M$ be the fast preceding questionnaire model. $M$ is a tree structure, and $M$ will determine $R^{\prime}$ , $I$ , and $O$ .

.

Let $C$ be the collection of all appearing confidence levels when creating $M$ , where $C$ will determine $P$ .

.

Let $S=\{U,V,W,D,R,R^{\prime},K,P,I,O,M,C\}$ be the system of the fast preceding questionnaire model.

.

Let $\bar{S}$ be the space of all possible questionnaire models.

.

Average accuracy rate (AAR): $\textit{AAR}=\frac{1}{\tilde{m}}\sum_{i=1}^{\tilde{m}}\frac{\sum_{j=1}^{n}K_{% ij}*I_{ij}}{\sum_{j=1}^{n}I_{ij}}$ . AAR describes how accurate the model can be. AAR can also be represented as $\textit{AAR}=\frac{1}{\tilde{m}}\sum_{i=1}^{\tilde{m}}\textit{AR}_{i}$ ; $\textit{AR}_{i}=\frac{\sum_{j=1}^{n}K_{ij}*I_{ij}}{\sum_{j=1}^{n}I_{ij}}$ , where $\textit{AR}_{i}$ is the accuracy rate of $\tilde{U_{i}}$ .

.

Average reduction rate (ARR): $\textit{ARR}=\frac{1}{\tilde{m}}\sum_{i=1}^{\tilde{m}}\frac{\sum_{j=1}^{n}I_{% ij}}{n}$ . ARR describes how well the model can accelerate the questionnaire. ARR can also be represented as $\textit{ARR}=\frac{1}{\tilde{m}}\sum_{i=1}^{\tilde{m}}\textit{RR}_{i}$ ; $\textit{RR}_{i}=\frac{\sum_{j=1}^{n}I_{ij}}{n}$ , where $\textit{RR}_{i}$ is the reduction rate $\tilde{U_{i}}$ .

.

Average $F_{\beta}$ -Measure: $AF_{\beta}=\frac{1}{\tilde{m}}\sum_{i=1}^{\tilde{m}}\frac{(\beta^{2}+1)*% \textit{AR}_{i}*\textit{RR}_{i}}{\beta^{2}*\textit{AR}_{i}+\textit{RR}_{i}}$ , where $\beta$ is a given parameter. $\textit{AF}_{\beta}$ describes a balance between AAR and ARR.

.

The problem is defined as the following.

$\textit{max\ AF}_{\beta}\ \ \textit{subject}\ to\ S\in\bar{S}$

3.2 Influence calculation formula

To create the model from the training dataset $D$ , the influence of one investigation attribute on all others should be calculated. The influence calculation formula is given in Definitions 21–25 when the depth of the created model reaches $t$ . The influence of the investigation attribute depends on the influence of the values.

.

The confidence level of the investigation attribute $V_{j_{k_{2}}}^{t}$ taking the value $v_{j_{k_{2}}}^{t}$ under the condition of the investigation attribute $V_{j_{k_{1}}}^{t}$ taking the value $v_{j_{k_{1}}}^{t}$ when the previous $k-1$ layer values $V^{1},V^{2},\ldots,V^{t-1}$ are already known.

$\displaystyle P(V_{j_{k_{2}}}^{t}=v_{j_{k_{2}}}^{t}|V_{j_{k_{1}}}^{t}=v_{j_{k_% {1}}};V^{1},V^{2},\ldots,V^{t-1})=\frac{N(V_{j_{k_{2}}}^{t}=v_{j_{k_{2}}}^{t},% V_{j_{k_{1}}}^{t}=v_{j_{k_{1}}};V^{1},V^{2},\ldots,V^{t-1})}{N(V_{j_{k_{1}}}^{% t}=v_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})}$ (1)

where $V_{j_{k_{1}}}^{t}\in V$ , $V_{j_{k_{2}}}^{t}\in V$ , $v_{j_{k_{1}}}^{t}\in W_{j_{k_{1}}}$ , and $v_{j_{k_{2}}}^{t}\in W_{j_{k_{2}}}$ . $V^{1}$ is the $1st$ -layer value of all investigation attributes $V$ , $V^{2}$ is the $2nd$ layer value of $V$ , etc. $N(V_{j_{k_{2}}}^{t}=v_{j_{k_{2}}}^{t},V_{j_{k_{1}}}^{t}=v_{j_{k_{1}}};V^{1},V^% {2},\ldots,V^{t-1})$ is the number of individuals in $U$ who take the value $v_{j_{k_{1}}}^{t}$ , $v_{j_{k_{2}}}^{t}$ on the investigation attributes $V_{j_{k_{1}}}^{t}$ , $V_{j_{k_{2}}}^{t}$ , respectively. $N(V_{j_{k_{1}}}^{t}=v_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})$ is the number of individuals in $U$ who take the value $v_{j_{k_{1}}}^{t}$ on $V_{j_{k_{1}}}^{t}$ .

.

The influence of $V_{j_{k_{1}}}^{t}$ taking $v_{j_{k_{1}}}^{t}$ on $V_{j_{k_{2}}}^{t}$ when the previous $k-1$ layer values $V^{1},V^{2},\ldots,V^{t-1}$ are already known.

$\displaystyle INF(V_{j_{k_{2}}}^{t}|V_{j_{k_{1}}}^{t}=v_{j_{k_{1}}};V^{1},V^{2% },\ldots,V^{t-1})$ (2) $\displaystyle\quad=\sum_{v_{j_{k{}_{2}}}^{t}\in W_{j_{k_{2}}}}P(V_{j_{k_{2}}}^% {t}=v_{j_{k_{2}}}^{t}|V_{j_{k_{1}}}^{t}=v_{j_{k_{1}}};V^{1},V^{2},\ldots,V^{t-% 1})^{2}$

where $P(V_{j_{k_{2}}}^{t}=v_{j_{k_{2}}}^{t}|V_{j_{k_{1}}}^{t}=v_{j_{k_{1}}};V^{1},V^% {2},\ldots,V^{t-1})$ is defined in Definition 21. Notice that $\textit{INF}(V_{j_{k_{2}}}^{t}|V_{j_{k_{1}}}^{t}=v_{j_{k_{1}}};V^{1},V^{2},% \ldots,V^{t-1})$ is not defined as $\textit{INF}(V_{j_{k_{2}}}^{t}|V_{j_{k_{1}}}^{t}=v_{j_{k_{1}}};V^{1},V^{2},% \ldots,V^{t-1})=\sum_{v_{j_{k{}_{2}}}^{t}\in W_{j_{k_{2}}}}P(V_{j_{k_{2}}}^{t}% =v_{j_{k_{2}}}^{t}|V_{j_{k_{1}}}^{t}=v_{j_{k_{1}}};V^{1},V^{2},\ldots,V^{t-1})$ because $\sum_{v_{j_{k{}_{2}}}^{t}\in W_{j_{k_{2}}}}P(V_{j_{k_{2}}}^{t}=v_{j_{k_{2}}}^{% t}|V_{j_{k_{1}}}^{t}=v_{j_{k_{1}}};V^{1},V^{2},\ldots,V^{t-1})=1$ is always true.

.

The influences of $V_{j_{k_{1}}}^{t}$ on $V_{j_{k_{2}}}^{t}$ when the previous $k-1$ layer values $V^{1},V^{2},\ldots,V^{t-1}$ are already known.

$\displaystyle INF(V_{j_{k_{2}}}^{t}|V_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-% 1})$ (3) $\displaystyle\quad=\sum_{v_{j_{k_{1}}}^{t}\in W_{j_{k_{1}}}}\biggl{[}P(V_{j_{k% _{1}}}^{t}=v_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})\ast$ $\displaystyle\quad\textit{INF}(V_{j_{k_{2}}}^{t}|V_{j_{k_{1}}}^{t}=v_{j_{k_{1}% }}^{t};V^{1},V^{2},\ldots,V^{t-1})\biggr{]}$

where $P(V_{j_{k_{1}}}^{t}=v_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})$ is the confidence level of $V_{j_{k_{1}}}^{t}$ taking $v_{j_{k_{1}}}^{t}$ .

.

The influence of $V_{j_{k_{2}}}^{t}$ on all other investigation attributes $\Delta$ when the previous $k-1$ layer values $V^{1},V^{2},\ldots,V^{t-1}$ are already known.

$\displaystyle INF(\Delta|V_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})$ (4) $\displaystyle\quad=\sum_{V_{j_{k_{2}}}^{t}\in V,k_{2}\neq k_{1}}INF(V_{j_{k_{2% }}}^{t}|V_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})^{2}$

where $\Delta=V-\{V_{j_{k_{1}}}^{t}\}$ .

.

The investigation attribute $V_{*}^{t}$ that has the largest influence on all other investigation attributes $\Delta$ .

$\displaystyle V_{*}^{t}=\underset{V_{j_{k_{1}}}^{t}\in V}{\textit{arg\ max}}\ % \textit{INF}(\Delta|V_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})$ (5)

3.3 Best attribute to split choosing algorithm (BASCA)

When creating the FPQM, it is necessary to choose the best attribute to split. When the depth of the created model reaches $t$ , we traverse every investigation attribute; then, we traverse every other investigation attribute and calculate the influence of the investigation attribute on all other attributes. Lines 3.3, 3.3, 3.3, 3.3, and 3.3 are calculated with Eqs (1)–(5). Then, we logically choose $V_{*}^{t}$ as the best attribute that has the largest influence on $\Delta$ . The pseudocode is shown in Algorithm 3.3 when the depth of the created model reaches $t$ .

[h!] Best attribute to split choosing algorithm (BASCA) Input: $D$ , $V$ Output: $V_{*}^{t}$ [1] ( $V_{j_{k_{1}}}^{t}\in V$ ) ( $V_{j_{k_{2}}}^{t}\in V$ & $V_{j_{k_{1}}}^{t}\neq V_{j_{k_{2}}}^{t}$ ) each value $v_{j_{k_{1}}}^{t}$ of $V_{j_{k_{1}}}^{t}$ each value $v_{j_{k_{2}}}^{t}$ of $V_{j_{k_{2}}}^{t}$ Calculate $P(v_{j_{k_{2}}}^{t}|v_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})$ Add $P(v_{j_{k_{2}}}^{t}|v_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})$ to $C$ Calculate $\textit{INF}(V_{j_{k_{2}}}^{t}|v_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})$ Calculate $\textit{INF}(V_{j_{k_{2}}}^{t}|V_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})$ Calculate $\textit{INF}(\Delta|V_{j_{k_{1}}}^{t};V^{1},V^{2},\ldots,V^{t-1})$ Calculate $V_{*}^{t}$ $V_{*}^{t}$

$V_{*}^{t}=\textit{BASCA(D,V)}$ . The BASCA returns the investigation attribute $V_{*}^{t}$ that has the largest influence on $\Delta$ . $C$ is a global variable, and the BASCA can also be called to obtain $C$ .

3.4 Fast preceding questionnaire model creating algorithm (FPQMCA)

Now, the FPQM can be created with the above groundwork. After $V_{*}$ is chosen as the best attribute to split, we traverse every value $v_{*}$ of $V_{*}$ , and the sub-model can be obtained with the sub-dataset corresponding to $v_{*}$ recursively. Then, we attach the sub-model to the full model. Algorithm 3.4 is the pseudocode.

[h!] Fast preceding questionnaire model creating algorithm (FPQMCA) Input: $D$ , $V$ , Index list: $L$ , $\beta$ Output: Fast preceding questionnaire model: $M$ [1] Create a node $M$ Let $N(L(0))$ be the number of 0s in $L$ $N(L(0))==1$ Let $V(L(0))$ be the investigation attribute whose corresponding index in $L$ is 1 $M$ as a leaf node labeled with $V(L(0))$ $V_{*}=\textit{BASCA}(D,V)$ Label node $M$ with $V_{*}$ Let $L(V^{-1}(V_{*}))$ be the corresponding index in $L$ of $V_{*}$ $L(V^{-1}(V_{*}))=1$ each value $v_{*}$ of $V_{*}$ Let $D_{v_{*}}$ be the set of data tuples in $D$ satisfying $v_{*}$ on $V_{*}$ $M_{v_{*}}=\textit{FPQMCA}(D_{v_{*}},V,L)$ Attach the node $M_{v_{*}}$ to node $M$ $M$

$L=\textit{List(0)}$ , and $L$ is initialized with a zero vector. Then, the FPQMCA can be called to obtain the fast preceding questionnaire model $M$ . $M=\textit{FPQMCA(D,V,L)}$ .

3.5 Model used for real investigation algorithm (MURIA)

With the created FPQM, the new person in the testing dataset can be investigated quickly. At the beginning of the real investigation, there is no information about the respondent; therefore, all we can do is ask about the attribute directly. After sufficient information has been accumulated, some investigation attributes can be inferred; otherwise, we continue asking about attributes.

Let Ind be the index indicating whether the current investigation attribute is the top attribute. $\textit{Ind}=0$ indicates that it is the top attribute, and $\textit{Ind}=1$ indicates that it is not. If the current investigation attribute is the top attribute, there is no information about the respondent; therefore, the attribute cannot be predicted.

$R^{\prime}$ , $I$ , $O$ , and $P$ are global variables that have be defined in Definitions 8, 10, 11, and 12, respectively. The pseudocode is shown in Algorithm 3.5.

[h!] Model used for real investigation algorithm (MURIA) Input: $R_{i}$ , $V$ , $M$ , threshold of confidence level: $\sigma$ , Index: IndOutput: Last node: LN[1] Let $M(1)$ be the top investigation attribute index of $M$ . Let $V(M(1))$ be the top investigation attribute of $M$ . Let $R_{i}(M(1))$ be the real value of $R_{i}$ on $V(M(1))$ . Let ${R_{i}}^{\prime}(M(1))$ be the final value of ${R_{i}}^{\prime}$ on V(M(1)). Let $I_{i}(M(1))$ be the indication value of $I_{i}$ on $V(M(1))$ . Let $P_{i}(M(1))$ be the confidence level of $P_{i}$ on $V(M(1))$ . Let $O_{i}(j_{1})$ be the current attribute of $O_{i}$ . $Ind==0$ ${R_{i}}^{\prime}(M(1))=R_{i}(M(1))$ $I_{i}(M(1))=0$ $P_{i}(M(1))=1$ $Ind=1$ $O_{i}(j_{1})=M(1)$ Let $M_{s}$ be the sub-model of $M$ when $\tilde{U_{i}}$ takes ${R_{i}}^{\prime}(M(1))$ . Let $M_{s}(1)$ be the top investigation attribute index of $M_{s}$ . Let $V(M_{s}(1))$ be the top investigation attribute of $M_{s}$ . Let $R_{i}(M_{s}(1))$ be the real value of $R_{i}$ on $V(M_{s}(1))$ . Let ${R_{i}}^{\prime}(M_{s}(1))$ be the final value of ${R_{i}}^{\prime}$ on $V(M_{s}(1))$ . Let $I_{i}(M_{s}(1))$ be the indication value of $I_{i}$ on $V(M_{s}(1))$ . Let $P_{i}(M_{s}(1))$ be the confidence level of $P_{i}$ on $V(M_{s}(1))$ . Let $O_{i}(j_{2})$ be the next attribute of $O_{i}$ . Let $W(M_{s}(1))$ be possible values on $M_{s}(1)$ . Let $C(v(M_{s}(1))\mid{R_{i}}^{\prime}(M(1)))$ be the confidence level of $M_{s}(1)$ taking $v(M_{s}(1))$ under the condition of $M(1)$ taking ${R_{i}}^{\prime}(M(1))$ . $C_{*}=\underset{v(M_{s}(1))\in W(M_{s}(1))}{max}C(v(M_{s}(1))\mid{R_{i}}^{% \prime}(M(1)))$ $C_{*}>\sigma$ ${R_{i}}^{\prime}(M_{s}(1))=v(M_{s}(1))$ $I_{i}(M_{s}(1))=1$ $P_{i}(M_{s}(1))=C_{*}$ $O_{i}(j_{2})=M_{s}(1)$ ${R_{i}}^{\prime}(M_{s}(1))=R_{i}(M_{s}(1))$ $I_{i}(M_{s}(1))=0$ $P_{i}(M_{s}(1))=1$ $O_{i}(j_{2})=M_{s}(1)$ $M_{s}\ !=a\ tree$ $M_{s}$ $\textit{MURIA}(R_{i},V,M_{s},\sigma,Ind)$

$\textit{Ind}=0$ is the initial value. We traverse every testing dataset in $R$ as $R_{i}$ , and we call the algorithm $\textit{PLN}=\textit{MURIA}(R_{i},V,M,\sigma,\textit{Ind})$ to obtain $R^{\prime}$ , $I$ , $O$ , and $P$ .

3.6 Model evaluation algorithm (MEA)

Now, the FPQM should be evaluated to determine its performance. Various evaluation metrics are calculated by the model evaluation algorithm (MEA). First, $K$ can be calculated with $R$ and $R^{\prime}$ . Then, AAR, AAR, and $AF_{\beta}$ can be obtained with $K$ and $I$ . A larger $AF_{\beta}$ indicates a better FPQM. Algorithm 3.6 is the pseudocode.

[h!] Model evaluation algorithm (MEA) Input: $R$ , $R^{\prime}$ , $I$ , $\beta$ Output:AAR, ARR, $\textit{AF}_{\beta}$ [1] $i\leftarrow 1$ to $\tilde{m}$ $j\leftarrow 1$ to $n$ ${R_{ij}}^{\prime}\neq R_{ij}$ $K_{ij}=0$ $K_{ij}=1$ Let Sum1, Sum2, Sum3, and Sum4 be temporary variables $\textit{Sum1}=0$ $\textit{Sum2}=0$ $\textit{Sum3}=0$ $i\leftarrow 1$ to $\tilde{m}$ $\textit{Sum4}=0$ $\textit{Sum5}=0$ $j\leftarrow 1$ to $n$ $\textit{Sum4}=\textit{Sum1}+K_{ij}*I_{ij}$ $\textit{Sum5}=\textit{Sum2}+I_{ij}$ $\textit{AR}_{i}=\textit{Sum4/Sum5}$ $\textit{RR}_{i}=\textit{Sum5/n}$ $F_{i}=\frac{(\beta^{2}+1)*\textit{AR}_{i}*\textit{RR}_{i}}{\beta^{2}*\textit{% AR}_{i}+\textit{RR}_{i}}$ $\textit{Sum1}=\textit{Sum1}+\textit{AR}_{i}$ $\textit{Sum2}=\textit{Sum2}+\textit{RR}_{i}$ $\textit{Sum3}=\textit{Sum3}+F_{i}$ $\textit{AAR}=\textit{Sum1}/\tilde{m}$ $\textit{ARR}=\textit{Sum2}/\tilde{m}$ $\textit{AF}_{\beta}=\textit{Sum3}/\tilde{m}$ AAR, ARR, $\textit{AF}_{\beta}$

When $\beta$ is given, $(\textit{AAR},\textit{ARR},\textit{AF}_{\beta})=\textit{MEA}(R,R^{\prime},I,\beta)$ . AAR, ARR, $AF_{\beta}$ can be obtained by calling the MEA.

3.7 An example

Here is an example to illustrate the Definitions 1–25 and Algorithms 3.3–3.6.

$\displaystyle U=\{U_{1},U_{2},U_{3},U_{4},U_{5}\}$ $\displaystyle V=\{V_{1},V_{2},V_{3},V_{4},V_{5}\}=\{\textit{Education, Income,% Social Skills, Work Ability, Communication}\}$ $\displaystyle W=\{W_{1},W_{2},W_{3},W_{4},W_{5}\}=\{\{0,1\},\{0,1,2\},\{0,1\},% \{0,1\},\{0,1\}\}$ $\displaystyle N=\{N_{1},N_{2},N_{3},N_{4},N_{5}\}=\{2,3,2,2,2\}$ $\displaystyle D=\left[\begin{array}[]{ccccc}0&1&0&1&1\\ 1&2&0&0&1\\ 1&0&1&0&1\\ 1&0&1&1&0\end{array}\right]$

When $t=1$ , no values of the investigation attributes are known yet. By Eq. (1),

$\displaystyle P(V_{2}^{1}=0|V_{1}^{1}=0)=\frac{N(V_{2}^{1}=0,V_{1}^{1}=0)}{N(V% _{1}^{1}=0)}=\frac{0}{1}=0$ (6) $\displaystyle P(V_{2}^{1}=1|V_{1}^{1}=0)=\frac{N(V_{2}^{1}=1,V_{1}^{1}=0)}{N(V% _{1}^{1}=0)}=\frac{1}{1}=1$ (7) $\displaystyle P(V_{2}^{1}=2|V_{1}^{1}=0)=\frac{N(V_{2}^{1}=2,V_{1}^{1}=0)}{N(V% _{1}^{1}=0)}=\frac{0}{1}=0$ (8)

By Eq. (3.2),

$\displaystyle P(V_{2}^{1}|V_{1}^{1}=0)=P(V_{2}^{1}=0|V_{1}^{1}=0)^{2}+P(V_{2}^% {1}=1|V_{1}^{1}=0)^{2}+P(V_{2}^{1}=2|V_{1}^{1}=0)^{2}=0^{2}+1^{2}+0^{2}=1$ (9)

Similar to Eqs (6)–(9),

$\displaystyle P(V_{2}^{1}|V_{1}^{1}=1)=P(V_{2}^{1}=0|V_{1}^{1}=1)^{2}+P(V_{2}^% {1}=1|V_{1}^{1}=1)^{2}+P(V_{2}^{1}=1|V_{1}^{1}=1)^{2}=\frac{2}{3}^{2}+0^{2}+% \frac{1}{3}^{2}=\frac{5}{9}$ (10)

By Eq. (3.2),

$\displaystyle P(V_{2}^{1}|V_{1}^{1})=P(V_{1}^{1}=0)*P(V_{2}^{1}|V_{1}^{1}=0)+P% (V_{1}^{1}=1)*P(V_{2}^{1}|V_{1}^{1}=1)=\frac{1}{4}*1+\frac{3}{4}*\frac{5}{9}=% \frac{2}{3}$ (11)

Similar to Eqs (6)–(11),

$\begin{split}\displaystyle P(V_{3}^{1}|V_{1}^{1})=\frac{2}{3}\ \ P(V_{4}^{1}|V% _{1}^{1})=\frac{2}{3}\ \ P(V_{5}^{1}|V_{1}^{1})=\frac{2}{3}\end{split}$ (12)

Table 1

Example training dataset

Investigation attributes	Education	Income	Social skills	Work ability	Communication
$U_{1}$	0	1	0	1	1
$U_{2}$	1	2	0	0	1
$U_{3}$	1	0	1	0	1
$U_{4}$	1	0	1	1	0

Table 2

Example testing dataset

Investigation attributes	Education	Income	Social skills	Work ability	Communication
$\tilde{U_{1}}$	1	1	0	1	0
$\tilde{U_{2}}$	0	1	1	0	1

By Eq. (3.2),

$\displaystyle P(\Delta|V_{1}^{1})=P(V_{2}^{1}|V_{1}^{1})+P(V_{3}^{1}|V_{1}^{1}% )+P(V_{4}^{1}|V_{1}^{1})+P(V_{5}^{1}|V_{1}^{1})=\frac{2}{3}+\frac{2}{3}+\frac{% 2}{3}+\frac{2}{3}=\frac{8}{3}$ (13)

Similar to Eqs (6)–(13),

$\displaystyle P(\Delta|V_{2}^{1})=\frac{7}{2}P(\Delta|V_{3}^{1})=\frac{11}{4}$ (14) $\displaystyle P(\Delta|V_{4}^{1})=\frac{5}{2}P(\Delta|V_{5}^{1})=\frac{5}{2}$

By Eq. (5),

$\displaystyle V_{*}^{1}=\underset{V_{j_{k_{1}}}^{1}\in V}{\textit{arg\ max}}\ % \textit{INF}(\Delta|V_{j_{k_{1}}}^{1})=V_{2}^{1}$ (15)

Equations (6)–(15) also show how Algorithm 3.3 performs.

As for Algorithm 3.4, a node M is created at the beginning. $L$ is initialized as a zero vector: $L=\textit{List(0)}$ . $\textit{N(L(0))}=n$ and not 1, and $V_{*}^{1}=\textit{BASCA(D,V)}$ with Algorithm 3.3. We label node M as $V_{*}^{1}$ , $L(V^{-1}(V_{*}^{1}))=1$ . $V_{*}^{1}=V_{2}^{1}$ (Income) and $W_{2}=\{0,1,2\}$ ; therefore, there are three branches from the attribute Income. On the branch $\textit{Income}=0$ , $D_{v_{*}}=\begin{bmatrix}1&1&0&1\\ 1&1&1&0\end{bmatrix}$ , $M_{v_{*}}=\textit{FPQMCA}(D_{v_{*}},V,L)$ . Attach the node $M_{v_{*}}$ to node $M$ . When $N(L(0))==1$ , Algorithm 3.4 terminates, and the fast preceding questionnaire model is created. Notice that only the first recursion step is shown here to ensure that the example is not too long.

Figure 2.

The obtained FPQM for the example training dataset.

The created model is shown in Fig. 2.

$\tilde{U}=\{\tilde{U_{1}},\tilde{U_{2}}\},R=\left[\begin{array}[]{ccccc}1&1&0&% 1&0\\ 1&0&1&1&0\end{array}\right].$

After the model is created, Algorithm 3.5 can be used for a real investigation on the testing dataset. $V_{*}^{1}=V_{2}^{1}$ ; therefore, $M(1)=2$ and $V(M(1))=V_{2}^{1}$ . $R_{1}=[1\ \ 1\ \ 0\ \ 1\ \ 0]$ . $\textit{Ind}=0$ , then ${R_{1}}^{\prime}(M(1))={R_{1}}^{\prime}(2)=R_{1}(2)=1$ , $I_{1}(2)=0$ , $P_{1}(2)=1$ , $\textit{Ind}=1$ . $O_{1}(1)=2$ . $M_{s}(1)=1$ , $V(M_{s}(1))=V_{1}^{1}$ . $C_{*}=\underset{v(M_{s}(1))\in W(M_{s}(1))}{\textit{max}}C(v(M_{s}(1))\mid{R_{% 1}}^{\prime}(2))=1$ and $v(M_{s}(1))=0$ . $C_{*}>\sigma=0.8$ , then ${R_{1}}^{\prime}(1)=0$ , $I_{1}(1)=1$ , $P_{1}(1)=1$ , $O_{1}(2)=1$ . $M_{s}=$ a tree; therefore, return $\textit{MURIA}(R_{1},V,M_{s},\sigma,1)$ . When Algorithm 3.5 terminates, we obtain $R_{1}^{\prime}=\left[\begin{array}[]{ccccc}0&1&0&1&1\end{array}\right]$ , $P_{1}$ , $I_{1}$ , and $O_{1}$ .

After Algorithm 3.5 finishes investigating the testing dataset $R$ , we can obtain

$\displaystyle R^{\prime}=\left[\begin{array}[]{ccccc}0&1&0&1&1\\ 1&0&1&1&0\end{array}\right],P=\left[\begin{array}[]{ccccc}1&1&1&1&1\\ 1&1&1&0.5&1\end{array}\right],I=\left[\begin{array}[]{ccccc}1&0&1&1&1\\ 1&0&1&0&1\end{array}\right],O=\left[\begin{array}[]{ccccc}2&1&3&4&5\\ 1&3&0&2&4\end{array}\right].$

$M$ is the fast preceding questionnaire model, $C$ is the collection of all confidence levels that appear when creating $M$ . $S=\{U,V,W,D,R,R^{\prime},K,P,I,O,M,C\}$ , where $\bar{S}$ is the space of all possible questionnaire models.

AAR, ARR, and $\textit{AF}_{\beta}$ can be calculated by Algorithm 3.6, which can be applied for model evaluation. With Lines 3.6–3.6, $K=\left[\begin{array}[]{ccccc}1&1&1&1&0\\ 1&1&1&1&1\end{array}\right]$ . With Lines 3.6–3.6, $\textit{AAR}=0.7$ , $\textit{ARR}=0.75$ , and $\textit{AF}_{0.5}=0.7114$ with $\beta=0.5$ .

The problem is defined as the following.

$\displaystyle\textit{max\ AF}_{\beta}\ \ \textit{subject\ to}\ S\in\bar{S}$

3.8 The comparison with decision tree

The fast preceding questionnaire model (FPQM) is similar to Decision Tree to some extent, but they are different models. There are many types of Decision Tree algorithms. The notable models include ID3, C4.5, C5.0, CART, CHAID, MARS, and Conditional Inference Trees. ID3, C4.5 and CART are chosen as representative Decision Tree algorithms. The FPQM is compared with these three Decision Tree algorithms in all aspects[5, 13, 26].

3.8.1 The comparison on measures

The algorithms for constructing decision trees choose an attribute at each step that best splits the dataset. Different Decision Tree algorithms use different metrics for determining the ‘best”. ID3 uses information gain, C4.5 uses a gain ratio, and CART uses the Gini index, whereas the FPQM uses Influence, which is defined in Definitions 21–25.

3.8.2 The comparison on solved problems

ID3, C4.5, and CART can solve both classification and regression problems. On the other hand, the FPQM is designed to optimize questionnaires by reordering the attributes and reducing the number of attributes by predicting the low-ranking attributes. The solved problems between the FPQM and Decision Tree are entirely different.

3.8.3 The comparison on model forms

The model form of the FPQM looks very similar to that of Decision Tree algorithms; both are in tree forms, as shown in Fig. 3.

Figure 3.

A part of the obtained FPQM.

3.8.4 The comparison on attribute types

The attribute types that the FPQM and Decision Tree can handle are different. ID3 can only handle nominal types, C4.5 and CART can handle both nominal and numeric types, and the FPQM can only handle nominal attribute types.

3.8.5 The comparison on splits

The split methods of the FPQM and Decision Tree are different. ID3 and C4.5 split the data in multiple ways, CART splits the data as a 2-way split, and the FPQM splits the data in multiple ways. CART can only create binary trees, whereas ID3, C4.5 and the FPQM can create general trees.

The comparisons between the FPQM with Decision Tree in all aspects are listed in Table 3.

Table 3
The comparison with decision tree

Aspect	ID3	C4.5	CART	FPQM
	Decision tree
Measure	Information gain	Gain ratio	Gini index	Influence
Solved problem	Classification	Classification	Classification/Regression	Reorder & Reduce
Model form	Tree	Tree	Tree	Tree
Attribute type	Nominal	Nominal/Numeric	Nominal/Numeric	Nominal
Split	Multi-way	Multi-way	2-way	Multi-way

3.9 The time complexity of the FPQM

There are many loops and recursive steps in the four algorithms constituting the FPQM; the time complexity of the FPQM is analyzed below. The derivation process of the time complexity is described in detail in Appendices.

3.9.1 The time complexity of the BASCA

The time complexity of the BASCA is

$T_{\textit{BASCA}}(n,\bar{N})=O(n^{2}{\bar{N}}^{2})$ (16)

where $\bar{N}$ is the average of $N_{j}$ in $N$ , $\bar{N}=\frac{1}{n}\sum_{j=1}^{n}N_{j}$ , and $n$ is the number of $V$ .

3.9.2 The time complexity of the FPQMCA

The time complexity of the FPQMCA is

$T_{\textit{FPQMCA}}(n,\bar{N})=O(\bar{N}^{n+1}+n^{2}\bar{N})$ (17)

3.9.3 The time complexity of the MURIA

The time complexity of the MURIA is

$\begin{split}\displaystyle T_{\textit{MURIA}}(n,\bar{N})=O(n\bar{N})\end{split}$ (18)

3.9.4 The time complexity of the MEA

The time complexity of the MEA is

$\displaystyle T_{\textit{MEA}}(n,\tilde{m})=O(\tilde{m}n)$ (19)

where $\tilde{m}$ is the number of $\tilde{U}$ .

3.9.5 The time complexity of the FPQM

The time complexity of the FPQM is

$\displaystyle T_{\textit{FPQM}}(n,\bar{N},\tilde{m})=T_{\textit{BASCA}}(n,\bar% {N})+T_{\textit{FPQMCA}}(n,\bar{N})+T_{\textit{MURIA}}(n,\bar{N})+T_{\textit{% MEA}}(n,\tilde{m})=O(n^{2}{\bar{N}}^{2})+O(\bar{N}^{n+1}+n^{2}\bar{N})+O(n\bar% {N})+O(\tilde{m}n)=O(n^{2}{\bar{N}}^{2})+O(\bar{N}^{n+1})+O(\tilde{m}n)$ (20)

4. Experiments

4.1 Experimental data

The experiments are based on actual data from the Lime Family Limited Company (Lime Family). Lime Family focuses on the pension service field and is the nationwide leading provider of high-quality home care services. Lime Family originally required 45 investigation attributes to assess a newcomer. The investigation attributes are based on the Barthel Index [18] and an ability assessment for elderly adults [28]. A total of 45 investigation attributes seems excessive for the elderly individuals who are asked to give the value of the attributes one by one, and the order of the attributes is not reasonable. Therefore, every assessment requires approximately 15–20 minutes, which is also excessive, resulting in customer churn in practice.

4.1.1 Noise

There are three main classes of noise: spurious readings, measurement error, and background data. Some values of the Height and Weight attributes are 0, and some values of the Age attribute are 2015. These are the spurious reading type of noise data; it is easy for people to see that they are incorrect values. Height, Weight and Age are numeric attributes, and thus, these noise data are replaced with a sample mean.

4.1.2 Missing values

Certain values of the Religion, Ground Walking, and Up Down Stairs attributes are missing. Religion is a nominal attribute, and Ground Walking and Up Down Stairs are ordinal attributes; thus, these missing values are replaced with a sample mode.

After preprocessing, including discretization, handling missing values, addressing noise data, and textual data processing, all data are converted into nominal data. The investigation attributes can be observed in Table 4.

Table 4
Investigation attributes

No.	Attribute name	No.	Attribute name	No.	Attribute name
0	Sex	15	Transfer Bed Chair	30	Financial Affairs Capability
1	Age	16	Ground Walking	31	Cognitive Function
2	Ethnic	17	Up Down Stairs	32	Attacks
3	Education	18	Witted	33	Depressive Symptoms
4	Religion	19	Height	34	Consciousness Level
5	Marital Status	20	Weight	35	Vision
6	Housing Condition	21	Glasses	36	Hearing
7	Income	22	Hearing Aid	37	Communication
8	Eating	23	Go Shopping	38	Life Skills
9	Bathe	24	Outings	39	Work Ability
10	Modify	25	Food Cooking	40	Time Spatial Orientation
11	Clothing	26	Maintain Housework	41	People Orientation
12	Stool Control	27	Washing Clothes	42	Social Skills
13	Pee Control	28	Use Phone Ability	43	Event
14	Toilet	29	Medication	44	Occurrences Times

The entire experiment is performed using the 64-bit Python language on a MacBook Pro, with a 2.2 GHz Intel Core i7 CPU and 16 GB of 1600 MHz DDR3 memory.

4.2 Evaluation metrics

To evaluate the performance of the FPQM, the accuracy rate, average accuracy rate, standard deviation of the accuracy rate, reduction rate, average reduction rate, and standard deviation of the reduction rate are defined as follows.

Accuracy rate (AR):

$\textit{AR}_{i}=\frac{\sum_{j=1}^{n}K_{ij}*I_{ij}}{\sum_{j=1}^{n}I_{ij}}$ (21)

$\textit{AR}_{i}$ is the accuracy rate of individual $\tilde{U_{i}}$ .

Average accuracy rate (AAR):

$\textit{AAR}=\frac{1}{\tilde{m}}\sum_{i=1}^{\tilde{m}}\frac{\sum_{j=1}^{n}K_{% ij}*I_{ij}}{\sum_{j=1}^{n}I_{ij}}$ (22)

AAR is the average accuracy rate of $\tilde{U}$ and describes how accurate the model can be.

Standard deviation of the accuracy rate (SAR):

$\textit{SAR}=\sqrt{\frac{1}{\tilde{m}}\sum_{i=1}^{\tilde{m}}(\textit{AR}_{i}-% \textit{AAR})^{2}}$ (23)

SAR is the standard deviation of the accuracy rate and describes the volatility of the accuracy rate.

Reduction rate (RR):

$\textit{RR}_{i}=\frac{\sum_{j=1}^{n}I_{ij}}{n}$ (24)

$\textit{RR}_{i}$ is the reduction rate of $\tilde{U_{i}}$ .

Average reduction rate (ARR):

$\textit{ARR}=\frac{1}{\tilde{m}}\sum_{i=1}^{\tilde{m}}\frac{\sum_{j=1}^{n}I_{% ij}}{n}$ (25)

ARR is the average reduction rate of $\tilde{U}$ and describes how the model can accelerate the questionnaire.

Standard deviation of the reduction rate (SRR):

$\textit{SRR}=\sqrt{\frac{1}{\tilde{m}}\sum_{i=1}^{\tilde{m}}(\textit{RR}_{i}-% \textit{ARR})^{2}}$ (26)

SRR is the standard deviation of the reduction rate and describes the volatility of the reduction rate.

$F_{\beta}$ -Measure (F):

$F_{\beta_{i}}=\frac{(\beta^{2}+1)*\textit{AR}_{i}*\textit{RR}_{i}}{\beta^{2}*% \textit{AR}_{i}+\textit{RR}_{i}}$ (27)

$F_{\beta_{i}}$ is the $F_{\beta}$ -Measure of $\tilde{U_{i}}$ .

Average $F_{\beta}$ -Measure (AF):

$\textit{AF}_{\beta}=\frac{1}{\tilde{m}}\sum_{i=1}^{\tilde{m}}\frac{(\beta^{2}+% 1)*\textit{AR}_{i}*\textit{RR}_{i}}{\beta^{2}*\textit{AR}_{i}+\textit{RR}_{i}}$ (28)

$\textit{AF}_{\beta}$ is the average $F_{\beta}$ -Measure of $\tilde{U}$ and describes a balance between AAR and ARR.

When $I_{ij}=0\ \forall i,j$ , $\textit{ARR}=0$ and $\textit{AAR}=1$ ; therefore, $\textit{AF}_{\beta}=0$ . When $I_{ij}=1\ \forall i,j$ , $\textit{ARR}=1$ , and both AAR and $AF_{\beta}$ will be low. A proper $I_{ij}$ should be chosen to maximize $\textit{AF}_{\beta}$ with proper AAR and ARR. $AF_{\beta}$ describes a balance between AAR and ARR. $\beta=0.5$ is chosen in this experiment because ARR is as important as AAR, and the goal is to improve ARR and AAR simultaneously.

Standard deviation of $F_{\beta}$ -Measure (SF):

$\textit{SF}=\sqrt{\frac{1}{\tilde{m}}\sum_{i=1}^{\tilde{m}}(F_{\beta_{i}}-% \textit{AF}_{\beta})^{2}}$ (29)

SF is the standard deviation of the $F_{\beta}$ -Measure and describes the volatility of the $F_{\beta}$ -Measure, which is the volatility of the overall performance.

4.3 The overall results

The depth of the obtained FPQM is 45 because there are 45 investigation attributes in total. There is a corresponding rule with every node in the FPQM, and there are a total of 6072 rules. A part of the obtained FPQM is shown in Fig. 3.

A reasonable order for the investigation attributes can be obtained from the FPQM. Transfer Bed Chair is the first investigation attribute. When there is a new individual $\tilde{U_{i}}$ of $\tilde{U}$ , if the Transfer Bed Chair value of $\tilde{U_{i}}$ equals 1, we enter the corresponding branch of the FPQM, and the second investigation attribute is Outings. This will continue recursively until the last investigation attribute and a reasonable order for $\tilde{U_{i}}$ are obtained during this process. The order will be as follows: Transfer Bed Chair, Outings etc. Every elderly man has his own exclusive reasonable questionnaire order, which is made especially for that individual [16].

The FPQM is evaluated using the metrics AR, RR, and $F$ , which are calculated with Eqs (21), (24), and (27), respectively. The AR, RR, and $F$ of the FPQM are shown in Fig. 4. The Y-axis indicates the values of AR, RR, and $F$ , and the x-axis indicates the id of the elderly individual, thereby representing each investigated person. AR is more volatile; accordingly, $F$ is also volatile. The FPQM is only a preceding questionnaire model, and the time required to obtain the pre-result is very short. To ensure 100% accuracy, the obtained pre-result needs to be verified by the elderly individual, but verification is certainly much faster than asking directly.

Figure 4.

AR, RR and F of the FPQM.

Figure 5.

AR, RR and F of the Expert Knowledge method.

4.4 Comparison experiments

To demonstrate that the FPQM is a good model for fast preceding questionnaires, Expert Knowledge, Rough Set [12, 17], and C4.5[13, 23] are applied to solve the same problem. The results of the FPQM and those from these three methods are compared.

4.4.1 Expert Knowledge

AR, RR and $F$ are shown in Fig. 5 when using the Expert Knowledge method. Expert Knowledge achieves a high AR, but the RR is very low. This result conforms with the characteristics of experts, who ensure high AR but do not greatly affect RR. AR is more volatile than RR between elderly individuals; accordingly, $F$ is also volatile.

4.4.2 Rough Set

The Rough Set method is also applied to solve the same problem in four steps. Step 1: First, take one attribute as the decision attribute and the other directions as the condition attributes. The correlation degree between each condition attribute and the decision attribute is calculated, and then, the attribute for which its correlation degree is less than a given threshold is deleted. Second, calculate the correlation degree between each condition with all the other conditions. Delete the condition attributes whose correlation degree with this particular condition attribute is larger than the degree with the decision attribute. Step 2: Generate the decision rules with the reliability and coverage degrees. Let the decision equivalent class (DEC) be the collection of elderly individuals who have the same assessment result. Calculate the reliability and coverage degree for each DEC and delete those classes whose degree is less than a given threshold. The classes that may generate rarely appearing rules and uncertain rules can be deleted in this manner. The decision rules can be created using each of the remaining decision equivalent classes. Step 3: Sort the investigation attributes with the coverage degree. Merge the rules according to the coverage degree and create the assessment sequence of attributes by sorting the coverage degrees in descending order. Step 4: Use the merged rules and assessment sequence to simulate the assessment process for the new elderly individual in $\tilde{U}$ .

When using the Rough Set method, AR, RR and $F$ are shown in Fig. 6. Rough Set achieves a high AR, but RR is slightly low. Both AR and RR are very volatile; therefore, $F$ is also quite volatile.

Figure 6.

AR, RR and F of the Rough Set method.

4.4.3 C4.5

Now, we solve the same problem by applying the C4.5 method. We choose each investigation attribute of the 45 attributes in total of V as the terminal node, and other attributes are internal nodes. Using the C4.5 method, a decision tree is obtained, and there are 45 decision trees with 45 attributes. Then, we evaluate the new individual $\tilde{U_{i}}$ of $\tilde{U}$ with the 45 obtained decision trees. Initially, there are no attributes that can be inferred, and we ask the elderly individual $\tilde{U_{i}}$ for the real value $R_{ij}$ directly. After accumulating sufficient real values $\{R_{ij}\}$ , $j\in\{1,2,\ldots,j_{*}\}$ , some attributes can be inferred as the leaf nodes of some of the 45 obtained decision trees; then, we can infer their final values $R_{ij}^{\prime}$ . Otherwise, we continue to ask about $R_{ij}$ directly. Finally, all final values $R_{ij}^{\prime}$ of the 45 attributes can be obtained.

AR, RR and $F$ using the C4.5 method are shown in Fig. 7. C4.5 achieves very low AR, RR and $F$ . The volatility of AR is slightly high, but C4.5 achieves a significantly lower volatility in RR and $F$ .

Figure 7.

AR, RR and F of the C4.5 method.

Figure 8.

The AR comparison for the FPQM, Expert Knowledge, Rough Set, and C4.5 methods.

4.4.4 Comparison results

The FPQM, Expert Knowledge, Rough Set, and C4.5 methods are compared with respect to AR, RR and $F$ individually to demonstrate which method achieves the best result.

Figure 8 shows the AR comparison for the FPQM, Expert Knowledge, Rough Set, and C4.5 methods. The FPQM achieves the best AR performance, and the volatility is the lowest. Expert Knowledge achieves a slightly better AR than the Rough Set and C4.5 methods, but these three methods have almost the same volatility.

The RR comparison of the FPQM, Expert Knowledge, Rough Set, and C4.5 methods is shown in Fig. 9. Apparently, the FPQM also achieves the best RR, which is far higher than those of the other three methods. The Rough Set method achieves the second best value and has the greatest volatility. By comparison, RR of the Expert Knowledge and C4.5 methods are the worst and approximately equal.

Table 5
Mean and standard deviation of the results under the four methods

Metric	Method	Mean	Standard deviation
AR	FPQM	0.9839	0.0560
	Expert Knowledge	0.9203	0.1993
	Rough Set	0.7589	0.2223
	C4.5	0.6850	0.2072
RR	FPQM	0.9056	0.0221
	Expert Knowledge	0.0769	0.0262
	Rough Set	0.3335	0.1003
	C4.5	0.0916	0.0312
F	FPQM	0.9666	0.0461
	Expert Knowledge	0.2800	0.0870
	Rough Set	0.5984	0.1758
	C4.5	0.2778	0.0637

Figure 9.

The RR comparison of the FPQM, Expert Knowledge, Rough Set, and C4.5 methods.

$F$ is an integrated measurement of AR and RR. Figure 10 shows the $F$ comparison for the FPQM, Expert Knowledge, Rough Set, and C4.5 methods. The FPQM achieves the best AR and RR, as will $F$ naturally, and the volatility is the lowest. Because the other three methods have almost the same AR volatility, the $F$ volatility is mainly determined by RR. The $F$ performance of the Rough Set method is the second best but is the most volatile. The Expert Knowledge and C4.5 methods are the worst and approximately equal, but they do not exhibit a very high volatility.

Figure 10.

The F comparison of the FPQM, Expert Knowledge, Rough Set, and C4.5 methods.

Table 5 shows the mean and standard deviation of the four methods with respect to AR, RR, and $F$ . The mean reflects the overall performance, and the standard deviation reflects the volatility in digital form. The mean of AR is denoted as AAR, as mentioned previously; similar notation is used for RR and $F$ . These values can be calculated with Eqs (22), (25), and (28). The standard deviation of AR is denoted as SAR; similar notation is used for RR and $F$ . These values can be calculated with Eqs (23), (26), and (29).

The FPQM has the largest mean and the smallest standard deviation with respect to AR, RR and $F$ . This means that the FPQM achieves the best performance and has the lowest volatility. The Expert Knowledge method has the second best mean for AR, 0.9203, and it has the smallest mean for RR, which conforms to the characteristics of expert knowledge. The Rough Set method is the most volatile with respect to AR, RR and $F$ , and it achieves the second best mean for $F$ and RR. The performance of the C4.5 method is almost the worst.

Figure 11.

AAR, ARR, AF, SAR, SRR, and SF with increasing number of elderly individuals.

Figure 12.

AAR, ARR, AF, SAR, SRR, and SF with increasing number of investigation attributes.

Figure 13.

AAR, ARR, AF, SAR, SRR, and SF with increasing threshold $\sigma$ .

4.4.5 Result analysis

Some of the improvements in the results are not highly significant when the FPQM is compared with other methods, such as the Rough Set method. We can explain this as follows. The Rough Set method is also an excellent method for attribute reduction. Both the Rough Set and FPQM methods can capture the internal relationships among the attributes listed in Table 4, and predict multiple attributes simultaneously. Therefore, the FPQM does not achieve highly significant improvements compared with the Rough Set method.

4.5 Factor analysis

Three factors are analyzed: the number of elderly individuals, the number of investigation attributes, and the threshold $\sigma$ . The evaluation metrics AAR, ARR, AF, SAR, SRR, and SF are used to show the changes in the results. AAR, ARR, and AF can be calculated with Eqs (22), (25), and (28), respectively. Equations (23), (26), and (29) can be used to calculate SAR, SRR, and SF, respectively.

As illustrated in Fig. 11, ARR decreases steadily and with minimal volatility when the number of the elderly individuals increases. AAR and AF are volatile and present a slight wavelike decrease overall. SRR increases continuously with a small slope. SF is highly similar to SAR because SRR is so small that SF is mainly determined by SAR.

With increasing number of investigation attributes, AAR, ARR, and AF present a fluctuating increasing trend, as shown in Fig. 12. The increased level of ARR is higher than that of AAR and AF, and ARR shows slightly larger volatility. SAR, SRR, and SF decrease with some volatility, and SRR falls fairly steadily in the second half of the curve. SRR has minimal impact on SF, and SF has the same general trend as SAR. Briefly, more investigation attributes produce better results.

Notice that some parts of the AAR, ARR, AF, SAR, SRR, and SF curves are flat, which simply embodies the intermittent nature of the threshold, as illustrated in Fig. 13. Only the threshold $\sigma$ achieves a level whereby a significant change can occur. Changes mainly occur when $\sigma$ is in the range of 0.3–0.65. If $\sigma$ is too small, $I_{ij}=1\ \forall 1\leqslant i\leqslant\tilde{m},1\leqslant i\leqslant n$ . On the other hand, $I_{ij}=0\ \forall 1\leqslant i\leqslant\tilde{m},1\leqslant i\leqslant n$ when $\sigma$ is too large. When the threshold $\sigma$ increases, AAR, AF, and SRR present an increasing trend, whereas ARR, SAR and SF decrease. SRR simply shows a slow growth. ARR and SRR have minimal impact on AF and SF, respectively. Therefore, AF and SF show the same general trends as AAR and SAR, respectively. Therefore, to achieve better results, it is suggested to choose a slightly larger threshold $\sigma$ .

5. Conclusion

Traditional questionnaires are used in a manner whereby respondents are asked one question after another. The main two problems are inefficiency and that the order of the questions is not reasonable. In this paper, the fast preceding questionnaire model (FPQM) is proposed to solve these problems in five steps, as shown in Fig. 1. The influence calculation formula, best attribute to split choosing algorithm (BASCA), fast preceding questionnaire model creating algorithm (FPQMCA), model used for real investigation algorithm (MURIA), and model evaluation algorithm (MEA) are all presented. The experimental section presents the experimental data; the evaluation metrics; the overall results of the FPQM; the comparison experiments with the Expert Knowledge, Rough Set, and C4.5 methods; and factor analysis, which includes the number of elderly individuals, the number of investigation attributes, and the threshold. When the FPQM is applied by the Lime Family company, after asking about certain attributes, most of the remaining attributes could be inferred automatically with a high accuracy and reduction rate and low volatility. To ensure 100% correctness, the model can be considered as a preceding questionnaire. Then, the elderly individual can perform verification, which is much faster than asking questions directly.

Further work will focus on three points. After the elderly individual is well assessed, a thorough study should be performed on determining an appropriate health care plan to be recommended automatically according to the assessment result. Then, the effect of the health care plan should be assessed as well. Moreover, as the amount data on elderly individuals continues increasing, certain distributed platforms, such as Hadoop and Spark, will be considered.

A. The time complexity of the FPQM

A.1 The time complexity of the BASCA

As shown in Algorithm 3.3, there are two loops on the attributes (Lines 3.3 and 3.3) and two loops on the values of the attributes (Lines 3.3 and 3.3) in the BASCA. As shown in Definition 3, $V=\{V_{1},V_{2},\ldots,V_{n}\}$ ; therefore, there are n investigation attributes in $V$ . $V_{j_{k_{1}}}^{t}$ has $N_{jk_{1}}$ possible values, where $N_{jk_{1}}$ is defined in Definition 5. Therefore, the time complexity of Lines 3.3–3.3 is

$\displaystyle T_{\textit{BASCA}}^{1}(n,\bar{N})=O(n^{2}{\bar{N}}^{2})$ (30)

where $\bar{N}$ is the average of $N_{j}$ in $N$ , $\bar{N}=\frac{1}{n}\sum_{j=1}^{n}N_{j}$ , and $n$ is the number of $V$ .

Line 3.3 calculates $V_{*}^{t}$ with Eq. (5), and there are n investigation attributes in $V$ ; therefore, the time complexity of Line 3.3 is

$\displaystyle T_{\textit{BASCA}}^{2}(n,\bar{N})=O(n)$ (31)

Thus, the time complexity of the BASCA is

$\displaystyle T_{\textit{BASCA}}(n,\bar{N})=T_{\textit{BASCA}}^{1}(n,\bar{N})+% T_{\textit{BASCA}}^{2}(n,\bar{N})=O(n^{2}{\bar{N}}^{2})$ (32)

A.2 The time complexity of the FPQMCA

The time complexity of Lines 3.4–3.4 is

$\displaystyle T_{\textit{FPQMCA}}^{1}(n,\bar{N})=O(1)$ (33)

The time complexity of Line 3.4 is

$\displaystyle T_{\textit{FPQMCA}}^{2}(n,\bar{N})=O(n^{2}{\bar{N}}^{2})$ (34)

The time complexity of Lines 3.4–3.4 is

$\displaystyle T_{\textit{FPQMCA}}^{3}(n,\bar{N})=O(1)$ (35)

The time complexity of Lines 3.4–3.4 is

$\displaystyle T_{\textit{FPQMCA}}^{4}(n,\bar{N})=\bar{N}*T_{\textit{FPQMCA}}(n% -1)$ (36)

The time complexity of the FPQMCA is

$\displaystyle T_{\textit{FPQMCA}}(n,\bar{N})=T_{\textit{FPQMCA}}^{1}(n,\bar{N}% )+T_{\textit{FPQMCA}}^{2}(n,\bar{N})+T_{\textit{FPQMCA}}^{3}(n,\bar{N})+T_{% \textit{FPQMCA}}^{4}(n,\bar{N})=O(1)+O(n^{2}{\bar{N}}^{2})+O(1)+\bar{N}*T_{% \textit{FPQMCA}}(n-1,\bar{N})=O(n^{2}{\bar{N}}^{2})+\bar{N}*T_{\textit{FPQMCA}% }(n-1,\bar{N})$ (37)

Equation (37) is the recursion formula of $T_{\textit{FPQMCA}}(n,\bar{N})$ .

$\displaystyle T_{\textit{FPQMCA}}(n,\bar{N})=O(n^{2}{\bar{N}}^{2})+\bar{N}*T_{% \textit{FPQMCA}}(n-1,\bar{N})=O(n^{2}{\bar{N}}^{2})+\bar{N}*\big{\{}O\big{(}(n% -1)^{2}\bar{N}^{2}\big{)}+\bar{N}*T_{\textit{FPQMCA}}(n-2,\bar{N})\big{\}}=O(n% ^{2}{\bar{N}}^{2})+O\big{(}(n-1)^{2}\bar{N}^{3}\big{)}+\bar{N}^{2}*T_{\textit{% FPQMCA}}(n-2,\bar{N})=O(n^{2}{\bar{N}}^{2})+O\big{(}(n-1)^{2}\bar{N}^{3})\big{% )}+\ldots+O(1^{2}\bar{N}^{n+1})+\bar{N}^{n-1}*T(1,\bar{N})=\sum_{i=1}^{n}O(i^{% 2}{\bar{N}}^{n-i+2})+\bar{N}^{n-1}*T(1,\bar{N})=O\big{(}\sum_{i=1}^{n}(i^{2}{% \bar{N}}^{n-i+2})\big{)}+\bar{N}^{n-1}*T(1,\bar{N})$ (38)

Let $S=\sum_{i=1}^{n}(i^{2}{\bar{N}}^{n-i+2})$ .

$\displaystyle(\bar{N}-1)S=\bar{N}^{n+2}+\sum_{i=1}^{n-1}\big{(}(i+i+1){\bar{N}% }^{n-i+2}\big{)}-n^{2}\bar{N}^{2}$ (39)

$\displaystyle\bar{N}(\bar{N}-1)S-(\bar{N}-1)S=\bar{N}^{n+3}+2\sum_{i=4}^{n+2}% \bar{N}^{i}-(n^{2}+2n-1)\bar{N}^{3}+n^{2}\bar{N}^{2}$ (40)

$\displaystyle S=\frac{1}{(\bar{N}-1)^{2}}\big{[}\bar{N}^{n+3}+\frac{2\bar{N}^{% 4}(1-\bar{N}^{n-1})}{1-\bar{N}}-(n^{2}+2n-1)\bar{N}^{3}+n^{2}\bar{N}^{2}\big{]}$ (41)

Plugging Eq. (41) into Eq. (38), $T(1,\bar{N})=O(1)$ ; thus,

$\displaystyle T_{\textit{FPQMCA}}(n,\bar{N})$ $\displaystyle=O\Big{(}\frac{1}{(\bar{N}-1)^{2}}\big{[}\bar{N}^{n+3}+\frac{2% \bar{N}^{4}(1-\bar{N}^{n-1})}{1-\bar{N}}-(n^{2}+2n-1)\bar{N}^{3}+n^{2}\bar{N}^% {2}\big{]}\Big{)}+\bar{N}^{n-1}*T(1,\bar{N})$ $\displaystyle=O\Big{(}\frac{1}{(\bar{N}-1)^{2}}\big{[}\bar{N}^{n+3}+\frac{2% \bar{N}^{4}(1-\bar{N}^{n-1})}{1-\bar{N}}-(n^{2}+2n-1)\bar{N}^{3}+n^{2}\bar{N}^% {2}\big{]}\Big{)}+O(\bar{N}^{n-1})$ $\displaystyle=O\Big{(}\frac{1}{\bar{N}^{2}}\big{[}\bar{N}^{n+3}+\frac{\bar{N}^% {4}\bar{N}^{n-1}}{\bar{N}}-n^{2}\bar{N}^{3}+n^{2}\bar{N}^{2}\big{]}\Big{)}+O(% \bar{N}^{n-1})$ $\displaystyle=O(\bar{N}^{n+1}+n^{2}\bar{N})$ (42)

The time complexity of the FPQMCA is

$\displaystyle T_{\textit{FPQMCA}}(n,\bar{N})=O(\bar{N}^{n+1}+n^{2}\bar{N})$ (43)

A.3 The time complexity of the MURIA

The time complexity of Lines 3.5–3.5 is

$\displaystyle T_{\textit{MURIA}}^{1}(n,\bar{N})=O(1)$ (44)

The time complexity of Line 3.5 is

$\displaystyle T_{\textit{MURIA}}^{2}(n,\bar{N})=O(\bar{N})$ (45)

The time complexity of Lines 3.5–3.5 is

$\displaystyle T_{\textit{MURIA}}^{3}(n,\bar{N})=O(1)$ (46)

The time complexity of Lines 3.5–3.5 is

$\displaystyle T_{\textit{MURIA}}^{4}(n,\bar{N})=T_{\textit{MURIA}}(n-1,\bar{N})$ (47)

The time complexity of the MURIA is

$\displaystyle T_{\textit{MURIA}}(n,\bar{N})=T_{\textit{MURIA}}^{1}(n,\bar{N})+% T_{\textit{MURIA}}^{2}(n,\bar{N})+T_{\textit{MURIA}}^{3}(n,\bar{N})+T_{\textit% {MURIA}}^{4}(n,\bar{N})=O(1)+O(\bar{N})+O(1)+T_{\textit{MURIA}}(n-1,\bar{N})=O% (\bar{N})+T_{\textit{MURIA}}(n-1,\bar{N})$ (48)

Equation (48) is the recursion formula of $T_{\textit{MURIA}}(n,\bar{N})$ .

$\displaystyle T_{\textit{MURIA}}(n,\bar{N})=O(\bar{N})+T_{\textit{MURIA}}(n-1,% \bar{N})=2O(\bar{N})+T_{\textit{MURIA}}(n-2,\bar{N})=(n-1)O(\bar{N})+T_{% \textit{MURIA}}(1,\bar{N})$ (49)

$T_{\textit{MURIA}}(1,\bar{N})=O(1)$ ; therefore, the time complexity of the MURIA is

$\displaystyle T_{\textit{MURIA}}(n,\bar{N})=(n-1)O(\bar{N})+O(1)=O(n\bar{N})$ (50)

A.4 The time complexity of the MEA

The time complexity of Lines 3.6–3.6 is

$\displaystyle T_{\textit{MEA}}^{1}(n,\tilde{m})=O(\tilde{m}n)$ (51)

The time complexity of Lines 3.6–3.6 is

$\displaystyle T_{\textit{MEA}}^{2}(n,\tilde{m})=O(\tilde{m}n)$ (52)

The time complexity of Lines 3.6–3.6 and 3.6–3.6 is

$\displaystyle T_{\textit{MEA}}^{3}(n,\tilde{m})=O(1)$ (53)

Thus, the time complexity of the MEA is

$\displaystyle T_{\textit{MEA}}(n,\tilde{m})=T_{\textit{MEA}}^{1}(n,\tilde{m})+% T_{\textit{MEA}}^{2}(n,\tilde{m})+T_{\textit{MEA}}^{3}(n,\tilde{m})=O(\tilde{m% }n)$ (54)

A.5 The time complexity of the FPQM

The time complexity of the FPQM is

Footnotes

Acknowledgments

This work is supported by the National High Technology Research and Development Program (“863” Program) of China under Grant No. 2015AA016009 and the National Natural Science Foundation of China under Grant No. 61232005. The authors wish to thank Lei Yang from the Lime Family Company, who provided us with the evaluation data for the elderly individuals. The data presented herein were only used for academic research. The customer ids have been anonymized, and the privacy of the elderly will not be invaded.

Appendix

References

Aoyama

Suzuki

Onishi

and Kuzuya

, Physical and functional factors in activities of daily living that predict falls in community-dwelling older women, Geriatrics & Gerontology International 11(3) (2011), 348–357.

Arnow

B.A.

Hunkeler

E.M.

Blasey

C.M.

Lee

Constantino

M.J.

Fireman

Kraemer

H.C.

Dea

Robinson

and Hayward

, Comorbid depression, chronic pain, and disability in primary care, Psychosomatic Medicine 68(2) (2006), 262–268.

Aydeniz

Altındağ

Ö.

İrdesel

Madenci

Toraman

Turhan

Yağcı

İ.

Eskiyurt

Yazgan

and Kutsal

Y.G.

, Physical, functional and sociocultural parameters that predict fall in elderly: Multicenter study, Journal of Physical Medicine & Rehabilitation Sciences/Fiziksel Tup ve Rehabilitasyon Bilimleri Dergisi 18(3) (2015).

Badia

, Vieta

and Gilabert

,The bone metastases quality of life questionnaire, in: Handbook of Disease Burdens and Quality of Life Measures, Springer, 2010, pp. 195–207.

Cinaroglu

,Comparison of performance of decision tree algorithms and random forest: an application on oecd countries health expenditures, International Journal of Computer Applications 138(1) (2016),37-41. Comparison of Performance of Decision Tree Algorithms and Random Forest: An Application on OECD Countries Health Expenditures.

Cruice

Worrall

and Hickson

, Personal factors, communication and vision predict social participation in older adults, Advances in Speech Language Pathology 7(4) (2005), 220–232.

Dima

A.-L.

, Living with chronic pain-a longitudinal study of the interrelations between acceptance, emotions, illness perceptions and health status, PhD thesis, 2010.

Drummond

F.J.

Sharp

Carsin

A.-E.

Kelleher

and Comber

, Questionnaire order significantly increased response to a postal survey sent to primary care physicians, Journal of Clinical Epidemiology 61(2) (2008), 177–185.

Faulkner

K.K.

and Cogan

, Measures of shame and conflict tactics: Effects of questionnaire order, Psychological Reports 66(3 suppl) (1990), 1217–1218.

10.

Fernandez

and Boyle

G.J.

, Affective and evaluative descriptors of pain in the mcgill pain questionnaire: Reduction and reorganization, The Journal of Pain 2(6) (2001), 318–325.

11.

Gatz

J.L.

Tyas

S.L.

John

P.S.

and Montgomery

, Do depressive symptoms predict alzheimer’s disease and dementia? The Journals of Gerontology Series A: Biological Sciences and Medical Sciences 60(6) (2005), 744–747.

12.

Gyorgy

and Linder

, Codecell convexity in optimal entropy-constrained vector quantization, IEEE Transactions on Information Theory 49(7) (2003), 1821–1828,

13.

Han

Pei

and Kamber

, Data Mining: Concepts and Techniques, Elsevier, 2011.

14.

Kersh

B.C.

Bradley

L.A.

Alarcón

G.S.

Alberts

K.R.

Sotolongo

Martin

M.Y.

Aaron

L.A.

Dewaal

D.F.

Domino

M.L.

Chaplin

W.F.

et al., Psychosocial and health status variables independently predict health care seeking in fibromyalgia, Arthritis Care & Research 45(4) (2001), 362–371.

15.

Kitisomprayoonkul

Klaphajone

and Kovindha

, Thai short-form mcgill pain questionnaire, Journal-Medical Association of Thailand 89(6) (2006), 846.

16.

Kroh

Winter

and Schupp

, Using person-fit measures to assess the impact of panel conditioning on reliability, in: Public Opinion Quarterly, 2016, nfw025.

17.

Liu

and Zhu

, On three types of covering-based rough sets via definable sets, in: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE, pp. 1226–1233.

18.

Mahoney

F.I.

, Functional evaluation: The barthel index, Maryland State Medical Journal 14 (1965), 61–65.

19.

Mccoll

Eccles

M.P.

Rousseau

N.S.

Steen

I.N.

Parkin

D.W.

and Grimshaw

J.M.

, From the generic to the condition-specific? instrument order effects in quality of life assessment, Medical Care 41(7) (2003), 777–790.

20.

McDowell

, Measuring Health: A Guide to Rating Scales and Questionnaires, Oxford University Press, 2006.

21.

Nijsten

T.E.

Sampogna

Chren

M.-M.

and Abeni

D.D.

, Testing and reducing skindex-29 using rasch analysis: Skindex-17, Journal of Investigative Dermatology 126(6) (2006), 1244–1250.

22.

Prieto

Alonso

and Lamarca

, Classical test theory versus rasch analysis for quality of life questionnaire reduction, Health and Quality of Life Outcomes 1(1) (2003), 1.

23.

Quinlan

J.R.

, C4.5: Programs for Machine Learning, Elsevier, 2014.

24.

Rippentrop

A.E.

Altmaier

E.M.

Chen

J.J.

Found

E.M.

and Keffala

V.J.

, The relationship between religion/spirituality and physical health, mental health, and pain in a chronic pain population, Pain 116(3) (2005), 311–321.

25.

Rosen

Cappelleri

Smith

Lipsky

and Pena

, Development and evaluation of an abridged, 5-item version of the international index of erectile function (iief-5) as a diagnostic tool for erectile dysfunction, International Journal of Impotence Research 11(6) (1999), 319–326.

26.

Singh

and Gupta

, Comparative study ID3, cart and C4. 5 decision tree algorithm: a survey, Int J Adv Inf Sci Technol [Internet] 27 (2014), 97–103.

27.

Vines

S.W.

Gupta

Whiteside

Dostal-Johnson

and Hummler-Davis

, The relationship between chronic pain, immune function, depression, and health behaviors, Biological Research for Nursing 5(1) (2003), 18–29.

28.

Wang

Feng

and Liu

, MZ/T 039-2013, Ability Assessment for Elder Adults, Standards Press of China, 2014.

An influence-based fast preceding questionnaire model for elderly assessments

Abstract

Keywords

1. Introduction

2. Related work

2.1 Attribute reduction

2.2 Relationships among attributes

2.3 Prediction

3. Fast preceding questionnaire model (FPQM)

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

3.2 Influence calculation formula

.

.

.

.

.

3.4 Fast preceding questionnaire model creating algorithm (FPQMCA)

3.5 Model used for real investigation algorithm (MURIA)

3.6 Model evaluation algorithm (MEA)

3.7 An example

3.8.1 The comparison on measures

3.8.2 The comparison on solved problems

3.8.3 The comparison on model forms

3.8.5 The comparison on splits

Table 3 The comparison with decision tree

3.9.1 The time complexity of the BASCA

4.1 Experimental data

4.1.1 Noise

4.1.2 Missing values

Table 4 Investigation attributes

4.4.1 Expert Knowledge

4.4.2 Rough Set

Table 5 Mean and standard deviation of the results under the four methods

4.5 Factor analysis

5. Conclusion

A. The time complexity of the FPQM

A.1 The time complexity of the BASCA

Footnotes

Acknowledgments

Appendix

References

Table 3
The comparison with decision tree

Table 4
Investigation attributes

Table 5
Mean and standard deviation of the results under the four methods