Abstract
To improve the efficiency of elderly assessments, an influence-based fast preceding questionnaire model (FPQM) is proposed. Compared with traditional assessments, the FPQM optimizes questionnaires by reordering their attributes. The values of low-ranking attributes can be predicted by the values of the high-ranking attributes. Therefore, the number of attributes can be reduced without redesigning the questionnaires. A new function for calculating the influence of the attributes is proposed based on probability theory. Reordering and reducing algorithms are given based on the attributes’ influences. The model is verified through a practical application. The practice in an elderly-care company shows that the FPQM can reduce the number of attributes by 90.56% with a prediction accuracy of 98.39%. Compared with other methods, such as the Expert Knowledge, Rough Set and C4.5 methods, the FPQM achieves the best performance. In addition, the FPQM can also be applied to other questionnaires.
Introduction
Questionnaires have been widely used in various fields, including elderly assessments. Several questionnaires have been developed and are currently in extensive use to assess health-related quality of life (HRQOL)[20]. Aging is an increasingly serious social phenomenon in China, and there is a strong need for care services. Assessments of the elderly are essential for providing personalized services. Existing assessment methods are usually based on the Barthel Index[18] and the national industry standard for the ability assessment of elderly adults[28]. Many investigation attributes are needed to systematically obtain information about the elderly. The elderly are asked about multiple attributes in succession. These assessment methods are inefficient, and the order of the attributes is not reasonable. When there is a relationship between attributes, some unknown attributes can be predicted by known attributes, and a more reasonable order should be determined[8, 9, 19].
Classical Test Theory (CTT), Rasch Analysis (RA), decision rule, and experts[4, 10, 15, 21, 22, 25] have been applied to reduce the length of health questionnaires. However, actually, these removed attributes have provided additional information. A reasonable order of these attributes of the questionnaires can also be considered. Correlation, multiple regression, factor analysis, cluster analysis and structural equation modelling, and hierarchical multiple regression [2, 7, 24, 27] can be used to determine the relationships among the attributes of health questionnaires. Certain attributes can indeed be predicted by other attributes using hierarchical logistic regression, correlation analysis, and binary logistic stepwise regression[1, 3, 6, 11, 14]. However, only one attribute, not multiple attributes simultaneously, could be predicted in one study. Meanwhile, the involved attributes in each study are incomplete.
A solid mathematical definition of the question is given. The fast preceding questionnaire model (FPQM) is proposed to solve the problem in five steps. First, the influence of one attribute on all other attributes is defined and calculated. Second, we traverse every investigation attribute and chooses the attribute with the largest influence as the best attribute to split. Third, we create the FPQM with the best attribute. We traverse every value of the attribute, and the sub-dataset corresponding to the value can be used to obtain the sub-model recursively. Then, the sub-model is attached to the full model, and the full model is obtained when the recursion ends. Fourth, the created FPQM is used for the real investigation. The value is directly asked for at the beginning of the real investigation because there is no prior information about the respondent. Certain investigation attributes can be inferred after sufficient information has been accumulated. At that time, the confidence level is greater than the given threshold, which means that questions about the attribute do not need to be asked. Fifth, we calculate the evaluation metrics and evaluate the model FPQM.
This paper is organized as follows. Section 2 reviews related work. The fast preceding questionnaire model (FPQM) is introduced in Section 3. First, a solid mathematical definition of the question is given. Then, an influence calculation formula, the best attribute to split choosing algorithm (BASCA), the fast preceding questionnaire model creating algorithm (FPQMCA), the model used for real investigation algorithm (MURIA), and the model evaluation algorithm (MEA) are presented. Section 4 shows the experimental results, therein presenting the experimental data;, evaluation metrics; the overall results of the FPQM; the comparison experiment with Expert Knowledge, Rough Set, and C4.5; and the factor analysis, which includes the number of elderly, number of investigation attributes, and threshold. Section 5 concludes the paper.
Related work
Attribute reduction
Prieto et al.[22] presents a parallel reduction in a 38-attribute questionnaire, the Nottingham Health Profile (NHP), to empirically compare Classical Test Theory (CTT) and Rasch Analysis (RA) results. The CTT results in 20 attributes (4 dimensions), whereas RA results in 22 attributes (2 dimensions). Moreover, the attribute-total correlation ranges from 0.45–0.75 for NHP20 and from 0.46–0.68 for NHP22, while the reliability ranges from 0.82–0.93 and from 0.87–94, respectively.
Fernandez and Boyle [10] reduces and reorganizes the McGill Pain Questionnaire (MPQ) using a 3-step decision rule for affective and evaluative descriptors of Pain. With a minimum absolute frequency of 17 and a minimum relative frequency of 1/2 as the threshold values, the words of the MPQ are reduced from 78 to less than 20 on average. This reduction leads to a negligible loss of information transmitted. Moreover, Kitisomprayoonkul et al. [15] develops the Thai Short-Form McGill Pain Questionnaire (Th-SFMPQ).
Rosen et al. [25] develops an abridged five-attribute version (IIEF-5) of the 15-attribute International Index of Erectile Function (IIEF) to diagnose the presence and severity of erectile dysfunction (ED). The five attributes are selected based on the ability to identify the presence or absence of ED and on adherence to the National Institute of Health’s definition of ED. The IIEF-5 possesses favorable properties for detecting the presence and severity of ED.
Badia et al. [4] achieves a qualitative and quantitative reduction in the 179 expressions of the bone metastasis quality of life questionnaire (BOMET-QOL) with respect to clarity, frequency and importance with 15 experts. This phase, which is performed in two steps, results in the 35-attribute version of the BOMET-QOL. The initial reduction yields a 25-attribute questionnaire via factorial analysis. Similarly, the BOMET-QOL-25 is reduced to an integrated version of 10 attributes through a sample of 263 oncology patients. The BOMET-QOL is an accurate, reliable and precise 10-attribute instrument for assessing HRQOL.
Nijsten et al. [21] tests and reduces Skindex-29 to Skindex-17 using Rasch Analysis. The Rasch Analysis of the combined emotion and social functioning subscale of Skindex-29 results in a 12-attribute psychosocial subscale. A total of five of the seven attributes are retained in a symptom subscale. Classical psychometric properties, such as the response distribution, attribute-rest correlation, attribute complexity, and internal consistency, of the two subscales of Skindex-17 are at least adequate. Skindex-17 is a Rasch-reduced version of Skindex-29, with two independent scores that can be used for the measurement of health-related quality of life (HRQOL) for dermatological patients.
[4, 10, 15, 21, 22, 25] remove some attributes directly and develop qualitative and quantitative reductions in questionnaires about health using Classical Test Theory (CTT), Rasch Analysis (RA), decision rules, or experts. These questionnaires include the Nottingham Health Profile (NHP), McGill Pain Questionnaire (MPQ), 15-attribute International Index of Erectile Function (IIEF), bone metastasis quality of life questionnaire (BOMET-QOL), and Skindex-29. However, these removed attributes can provide additional information, and their values can be predicted by the remaining attributes with reduction methods. Meanwhile, a more reasonable order of these attributes is not considered.
Relationships among attributes
Dima [7] studie the interrelations between acceptance, emotions, illness perceptions and health status. The confirmatory analysis (employing a variety of statistical procedures, from correlation to multiple regression, factor analysis, cluster analysis and structural equation modelling) largely confirms the expected relations within and between domains and is also informative regarding the most suitable data reduction methods. An additional exploratory analysis focuses on identifying the comparative characteristics of acceptance, emotions, and illness perceptions in predicting health status metrics.
Arnow et al. [2] provide estimates of the prevalence and strength of association between major depression and chronic pain in a primary care population and examines the clinical burden associated with the two conditions alone and together. Data are collected by questionnaires assessing major depressive disorder (MDD), chronic pain, pain-related disability, somatic symptom severity, panic disorder, other anxiety, probable alcohol abuse, and health-related quality of life (HRQL). The instruments include the Patient Health Questionnaire, SF-8, and the Graded Chronic Pain Questionnaire. The conclusions are that chronic pain is common among those with MDD, and Comorbid MDD and disabling chronic pain are associated with greater clinical burden than is MDD alone.
Rippentrop et al. [24] seek to better understand the relationships among religion/spirituality and physical health, mental health, and pain in 122 patients with chronic musculoskeletal pain. Hierarchical multiple regression analyzes reveal significant associations between components of religion/spirituality and physical and mental health. Forgiveness, negative religious coping, daily spiritual experiences, religious support, and self-rankings of religious/spiritual intensity significantly predict mental health status. Religion/spirituality is unrelated to pain intensity and life interference due to pain. Religion/spirituality may have both costs and benefits for the health of those with chronic pain.
Vines et al. [27] determine the relationships between pain perceptions, immune function, depression and health behaviors and examines the effects of chronic pain on immune function using depression and health behaviors as covariates. Pain perceptions show positive significant correlations with depression (
Most of the attributes mentioned in [2, 7, 24, 27] are included in Table 4, such as acceptance, emotions, illness perceptions, and health status; depression, chronic pain, and clinical burden; religion/spirituality and physical health, mental health, and pain; and pain perceptions, immune function, depression and health behaviors. The only two differences between the investigation attributes in Table 4 and the attributes mentioned in the literature are the expressions. The attributes mentioned in the literature are more conceptual. Applied methods include correlation, multiple regression, factor analysis, cluster analysis, structural equation modelling, and hierarchical multiple regression. The literature proves that relationships among these attributes do exist. However, the attributes covered by the relationships in each study are incomplete, and the relationships have not been well utilized to provide results of interest such as in prediction.
Prediction
Kersh et al. [14] use psychosocial and health status variables independently to predict health care seeking for fibromyalgia. Subjects are administered 14 measures, which produce six domains of variables: background demographics and pain duration; psychiatric morbidity; and personality, environmental, cognitive, and health status factors. These domains are input into 4 different hierarchical logistic regression analyzes to predict the status as patient or non-patient. The full regression model is statistically significant (
Aoyama et al. [1] use physical and functional factors in activities of daily living to predict falls in community-dwelling older women. Correlation analysis investigating associations among the scores of assessment scales and actual measurements of muscle strength and balance shows that there are significant correlations between handgrip strength and the Falls Efficacy Scale, Functional Reach test, Timed Up and Go test, Berg Balance Scale, Motor Fitness Scale, and Motor Functional Independence Measure in fallers and non-fallers. A binary logistic stepwise regression analysis reveals that only an inability of “being able to go up and down the staircase” in the Motor Fitness Scale remains a significant variable to predict falls.
Aydeniz et al. [3] predict falls in the elderly with physical, functional and sociocultural parameters. Falls are common in patients with weakness, fatigue, dizziness, and swelling in the legs and in subjects with appetite loss. Fallers have lower functional status than do non-fallers (
Gatz et al. [11] use depressive symptoms to predict Alzheimer’s disease and dementia. The Total Center for Epidemiologic Studies Depression (CES-D) score is a significant predictor of AD and dementia when categorized as a dichotomous variable according to the cutoff scores of 16 and 17; a CES-D cutoff of 21 is a significant predictor of AD and a marginally significant predictor of dementia. When analyzed as a continuous variable, the CES-D score is marginally predictive of AD and dementia. Neither participant-reported history of depression nor participant-reported duration of depression is significant in predicting AD or dementia.
Cruice et al. [6] predict social participation in older adults with personal factors, communication and vision. Assessments are individually conducted in a face-to-face interview situation with the primary researcher, who is a speech pathologist. Social participation is shown to be associated with vision, communication activities, age, education and emotional health. Naming and hearing impairments are not reliable predictors of social participation. It is concluded that professionals interested in maintaining and improving the social participation of older people should strongly consider these predictors in community-directed interventions.
Most of the attributes mentioned in [1, 3, 6, 11, 14] are also included in Table 4 such as psychosocial and health status variables and health care; physical, functional and sociocultural parameters and falls; depressive symptoms, Alzheimer’s disease and dementia; and personal factors, communication and vision, and social participation. Health care, falls, Alzheimer’s disease, dementia, and social participation are predicted using hierarchical logistic regression, correlation analysis, and binary logistic stepwise regression, respectively. The literature here is illustrative of the fact that certain attributes can indeed be predicted by other attributes. However, one study only predicts one attribute, not multiple attributes simultaneously. Meanwhile, the involved attributes in each study are also incomplete. A sufficient prediction between complete attributes can be studied.
The relation and motivation between all the related works and this research are emphasized and explained here. [4, 10, 15, 21, 22, 25] show that certain attributes in Table 4 can be reduced directly. The information of these attributes is redundant and contained in other attributes. [2, 7, 24, 27] provide further evidence that there is an inherent relationship between these attributes. [1, 3, 6, 11, 14] further show that certain attributes can be predicted by other attributes because of the underlying relationship. All the related works form the foundation of this research. The proposed fast preceding questionnaire model (FPQM) can achieve state-of-the-art performance only when there is an inherent relationship between attributes. If the relationship does not exist at all, no methods can predict certain attributes by others. The relationship is the foundation of all possible methods, including the FPQM. [4, 10, 15, 21, 22, 25] reduce some attributes directly, while the FPQM predicts the values of the attributes and preserves these attributes. In addition, In [1, 3, 6, 11, 14], one study only predicts one attribute, while the FPQM can predict multiple attributes simultaneously.
Fast preceding questionnaire model (FPQM)
A solid mathematical definition of question is given, and the fast preceding questionnaire model (FPQM) is proposed to solve the problem in five steps.
We calculate, in order, the confidence level of the attribute by taking a value under the condition of another attribute taking a value, the influence of the attribute taking a value on another attribute, the influence of the attribute on another attribute, the influence of the attribute on all other attributes, and the attribute that has the largest influence on all other attributes.
We traverse every investigation attribute, every other attribute, every value of the attribute, and every value of the other attribute to calculate every influence. Finally, the influence of the investigation attribute on all other attributes can be calculated. Then, we logically choose the best attribute that has the largest influence on all other attributes.
After the best attribute to split is chosen, we traverse every value of the attribute, and the sub-model can be obtained with sub-dataset corresponding to the value recursively. Then, we attach the sub-model to the full model, and the full model is obtained when the recursion ends.
Now, the FPQM can be used to investigate a new respondent. At the beginning of the real investigation, there is no prior information about the respondent; therefore, we ask for the value directly. After sufficient information has been accumulated, some investigation attributes can be inferred. If the confidence level is larger than the given threshold, then the attribute does not need to be asked about.
After the FPQM is used to investigate the new respondent, the evaluation metrics can be calculated. The FPQM can be evaluated based on these metrics.
Illustration of the five steps.
The five steps are also presented in the form of a graph, as shown in Fig. 1. Step 1: Calculate the influence with Eqs (1)–(5). Step 2: Choose the best attribute to split with Algorithm 3.3 and the calculated influence in Step 1. Step 3: Create the FPQM with Algorithm 3.4, and at every recursion step, call Algorithm 3.3 to choose the best attribute to split. Step 4: Use the FPQM for the real investigation with Algorithm 3.5 after the FPQM is created in Step 3. Step 5: Evaluate the model with Algorithm 3.6.
.
Let
.
Let
.
Let
.
Let
.
.
Let
.
Let
.
Let
.
Let
.
Let
.
Let
.
Let
.
Let
.
Let
.
Let
.
Let
.
Average accuracy rate (AAR):
.
Average reduction rate (ARR):
.
Average
.
The problem is defined as the following.
Influence calculation formula
To create the model from the training dataset
.
The confidence level of the investigation attribute
where
.
The influence of
where
.
The influences of
where
.
The influence of
where
.
The investigation attribute
When creating the FPQM, it is necessary to choose the best attribute to split. When the depth of the created model reaches
[h!] Best attribute to split choosing algorithm (BASCA)
Fast preceding questionnaire model creating algorithm (FPQMCA)
Now, the FPQM can be created with the above groundwork. After
[h!] Fast preceding questionnaire model creating algorithm (FPQMCA)
Model used for real investigation algorithm (MURIA)
With the created FPQM, the new person in the testing dataset can be investigated quickly. At the beginning of the real investigation, there is no information about the respondent; therefore, all we can do is ask about the attribute directly. After sufficient information has been accumulated, some investigation attributes can be inferred; otherwise, we continue asking about attributes.
Let Ind be the index indicating whether the current investigation attribute is the top attribute.
[h!] Model used for real investigation algorithm (MURIA)
Model evaluation algorithm (MEA)
Now, the FPQM should be evaluated to determine its performance. Various evaluation metrics are calculated by the model evaluation algorithm (MEA). First,
[h!] Model evaluation algorithm (MEA)
When
An example
Here is an example to illustrate the Definitions 1–25 and Algorithms 3.3–3.6.
When
By Eq. (3.2),
By Eq. (3.2),
Example training dataset
Example testing dataset
By Eq. (3.2),
By Eq. (5),
Equations (6)–(15) also show how Algorithm 3.3 performs.
As for Algorithm 3.4, a node M is created at the beginning.
The obtained FPQM for the example training dataset.
The created model is shown in Fig. 2.
After the model is created, Algorithm 3.5 can be used for a real investigation on the testing dataset.
After Algorithm 3.5 finishes investigating the testing dataset
AAR, ARR, and
The problem is defined as the following.
The fast preceding questionnaire model (FPQM) is similar to Decision Tree to some extent, but they are different models. There are many types of Decision Tree algorithms. The notable models include ID3, C4.5, C5.0, CART, CHAID, MARS, and Conditional Inference Trees. ID3, C4.5 and CART are chosen as representative Decision Tree algorithms. The FPQM is compared with these three Decision Tree algorithms in all aspects[5, 13, 26].
The comparison on measures
The algorithms for constructing decision trees choose an attribute at each step that best splits the dataset. Different Decision Tree algorithms use different metrics for determining the ‘best”. ID3 uses information gain, C4.5 uses a gain ratio, and CART uses the Gini index, whereas the FPQM uses Influence, which is defined in Definitions 21–25.
The comparison on solved problems
ID3, C4.5, and CART can solve both classification and regression problems. On the other hand, the FPQM is designed to optimize questionnaires by reordering the attributes and reducing the number of attributes by predicting the low-ranking attributes. The solved problems between the FPQM and Decision Tree are entirely different.
The comparison on model forms
The model form of the FPQM looks very similar to that of Decision Tree algorithms; both are in tree forms, as shown in Fig. 3.
A part of the obtained FPQM.
The attribute types that the FPQM and Decision Tree can handle are different. ID3 can only handle nominal types, C4.5 and CART can handle both nominal and numeric types, and the FPQM can only handle nominal attribute types.
The comparison on splits
The split methods of the FPQM and Decision Tree are different. ID3 and C4.5 split the data in multiple ways, CART splits the data as a 2-way split, and the FPQM splits the data in multiple ways. CART can only create binary trees, whereas ID3, C4.5 and the FPQM can create general trees.
The comparisons between the FPQM with Decision Tree in all aspects are listed in Table 3.
The comparison with decision tree
The comparison with decision tree
There are many loops and recursive steps in the four algorithms constituting the FPQM; the time complexity of the FPQM is analyzed below. The derivation process of the time complexity is described in detail in Appendices.
The time complexity of the BASCA
The time complexity of the BASCA is
where
The time complexity of the FPQMCA is
The time complexity of the MURIA is
The time complexity of the MEA is
where
The time complexity of the FPQM is
Experimental data
The experiments are based on actual data from the Lime Family Limited Company (Lime Family). Lime Family focuses on the pension service field and is the nationwide leading provider of high-quality home care services. Lime Family originally required 45 investigation attributes to assess a newcomer. The investigation attributes are based on the Barthel Index [18] and an ability assessment for elderly adults [28]. A total of 45 investigation attributes seems excessive for the elderly individuals who are asked to give the value of the attributes one by one, and the order of the attributes is not reasonable. Therefore, every assessment requires approximately 15–20 minutes, which is also excessive, resulting in customer churn in practice.
Noise
There are three main classes of noise: spurious readings, measurement error, and background data. Some values of the Height and Weight attributes are 0, and some values of the Age attribute are 2015. These are the spurious reading type of noise data; it is easy for people to see that they are incorrect values. Height, Weight and Age are numeric attributes, and thus, these noise data are replaced with a sample mean.
Missing values
Certain values of the Religion, Ground Walking, and Up Down Stairs attributes are missing. Religion is a nominal attribute, and Ground Walking and Up Down Stairs are ordinal attributes; thus, these missing values are replaced with a sample mode.
After preprocessing, including discretization, handling missing values, addressing noise data, and textual data processing, all data are converted into nominal data. The investigation attributes can be observed in Table 4.
Investigation attributes
Investigation attributes
The entire experiment is performed using the 64-bit Python language on a MacBook Pro, with a 2.2 GHz Intel Core i7 CPU and 16 GB of 1600 MHz DDR3 memory.
To evaluate the performance of the FPQM, the accuracy rate, average accuracy rate, standard deviation of the accuracy rate, reduction rate, average reduction rate, and standard deviation of the reduction rate are defined as follows.
AAR is the average accuracy rate of
SAR is the standard deviation of the accuracy rate and describes the volatility of the accuracy rate.
ARR is the average reduction rate of
SRR is the standard deviation of the reduction rate and describes the volatility of the reduction rate.
When
SF is the standard deviation of the
The depth of the obtained FPQM is 45 because there are 45 investigation attributes in total. There is a corresponding rule with every node in the FPQM, and there are a total of 6072 rules. A part of the obtained FPQM is shown in Fig. 3.
A reasonable order for the investigation attributes can be obtained from the FPQM. Transfer Bed Chair is the first investigation attribute. When there is a new individual
The FPQM is evaluated using the metrics AR, RR, and
AR, RR and F of the FPQM.
AR, RR and F of the Expert Knowledge method.
To demonstrate that the FPQM is a good model for fast preceding questionnaires, Expert Knowledge, Rough Set [12, 17], and C4.5[13, 23] are applied to solve the same problem. The results of the FPQM and those from these three methods are compared.
Expert Knowledge
AR, RR and
Rough Set
The Rough Set method is also applied to solve the same problem in four steps. Step 1: First, take one attribute as the decision attribute and the other directions as the condition attributes. The correlation degree between each condition attribute and the decision attribute is calculated, and then, the attribute for which its correlation degree is less than a given threshold is deleted. Second, calculate the correlation degree between each condition with all the other conditions. Delete the condition attributes whose correlation degree with this particular condition attribute is larger than the degree with the decision attribute. Step 2: Generate the decision rules with the reliability and coverage degrees. Let the decision equivalent class (DEC) be the collection of elderly individuals who have the same assessment result. Calculate the reliability and coverage degree for each DEC and delete those classes whose degree is less than a given threshold. The classes that may generate rarely appearing rules and uncertain rules can be deleted in this manner. The decision rules can be created using each of the remaining decision equivalent classes. Step 3: Sort the investigation attributes with the coverage degree. Merge the rules according to the coverage degree and create the assessment sequence of attributes by sorting the coverage degrees in descending order. Step 4: Use the merged rules and assessment sequence to simulate the assessment process for the new elderly individual in
When using the Rough Set method, AR, RR and
AR, RR and F of the Rough Set method.
Now, we solve the same problem by applying the C4.5 method. We choose each investigation attribute of the 45 attributes in total of V as the terminal node, and other attributes are internal nodes. Using the C4.5 method, a decision tree is obtained, and there are 45 decision trees with 45 attributes. Then, we evaluate the new individual
AR, RR and
AR, RR and F of the C4.5 method.
The AR comparison for the FPQM, Expert Knowledge, Rough Set, and C4.5 methods.
The FPQM, Expert Knowledge, Rough Set, and C4.5 methods are compared with respect to AR, RR and
Figure 8 shows the AR comparison for the FPQM, Expert Knowledge, Rough Set, and C4.5 methods. The FPQM achieves the best AR performance, and the volatility is the lowest. Expert Knowledge achieves a slightly better AR than the Rough Set and C4.5 methods, but these three methods have almost the same volatility.
The RR comparison of the FPQM, Expert Knowledge, Rough Set, and C4.5 methods is shown in Fig. 9. Apparently, the FPQM also achieves the best RR, which is far higher than those of the other three methods. The Rough Set method achieves the second best value and has the greatest volatility. By comparison, RR of the Expert Knowledge and C4.5 methods are the worst and approximately equal.
Mean and standard deviation of the results under the four methods
Mean and standard deviation of the results under the four methods
The RR comparison of the FPQM, Expert Knowledge, Rough Set, and C4.5 methods.
The F comparison of the FPQM, Expert Knowledge, Rough Set, and C4.5 methods.
Table 5 shows the mean and standard deviation of the four methods with respect to AR, RR, and
The FPQM has the largest mean and the smallest standard deviation with respect to AR, RR and
AAR, ARR, AF, SAR, SRR, and SF with increasing number of elderly individuals.
AAR, ARR, AF, SAR, SRR, and SF with increasing number of investigation attributes.
AAR, ARR, AF, SAR, SRR, and SF with increasing threshold 
Some of the improvements in the results are not highly significant when the FPQM is compared with other methods, such as the Rough Set method. We can explain this as follows. The Rough Set method is also an excellent method for attribute reduction. Both the Rough Set and FPQM methods can capture the internal relationships among the attributes listed in Table 4, and predict multiple attributes simultaneously. Therefore, the FPQM does not achieve highly significant improvements compared with the Rough Set method.
Factor analysis
Three factors are analyzed: the number of elderly individuals, the number of investigation attributes, and the threshold
As illustrated in Fig. 11, ARR decreases steadily and with minimal volatility when the number of the elderly individuals increases. AAR and AF are volatile and present a slight wavelike decrease overall. SRR increases continuously with a small slope. SF is highly similar to SAR because SRR is so small that SF is mainly determined by SAR.
With increasing number of investigation attributes, AAR, ARR, and AF present a fluctuating increasing trend, as shown in Fig. 12. The increased level of ARR is higher than that of AAR and AF, and ARR shows slightly larger volatility. SAR, SRR, and SF decrease with some volatility, and SRR falls fairly steadily in the second half of the curve. SRR has minimal impact on SF, and SF has the same general trend as SAR. Briefly, more investigation attributes produce better results.
Notice that some parts of the AAR, ARR, AF, SAR, SRR, and SF curves are flat, which simply embodies the intermittent nature of the threshold, as illustrated in Fig. 13. Only the threshold
Conclusion
Traditional questionnaires are used in a manner whereby respondents are asked one question after another. The main two problems are inefficiency and that the order of the questions is not reasonable. In this paper, the fast preceding questionnaire model (FPQM) is proposed to solve these problems in five steps, as shown in Fig. 1. The influence calculation formula, best attribute to split choosing algorithm (BASCA), fast preceding questionnaire model creating algorithm (FPQMCA), model used for real investigation algorithm (MURIA), and model evaluation algorithm (MEA) are all presented. The experimental section presents the experimental data; the evaluation metrics; the overall results of the FPQM; the comparison experiments with the Expert Knowledge, Rough Set, and C4.5 methods; and factor analysis, which includes the number of elderly individuals, the number of investigation attributes, and the threshold. When the FPQM is applied by the Lime Family company, after asking about certain attributes, most of the remaining attributes could be inferred automatically with a high accuracy and reduction rate and low volatility. To ensure 100% correctness, the model can be considered as a preceding questionnaire. Then, the elderly individual can perform verification, which is much faster than asking questions directly.
Further work will focus on three points. After the elderly individual is well assessed, a thorough study should be performed on determining an appropriate health care plan to be recommended automatically according to the assessment result. Then, the effect of the health care plan should be assessed as well. Moreover, as the amount data on elderly individuals continues increasing, certain distributed platforms, such as Hadoop and Spark, will be considered.
The time complexity of the FPQM
The time complexity of the BASCA
As shown in Algorithm 3.3, there are two loops on the attributes (Lines 3.3 and 3.3) and two loops on the values of the attributes (Lines 3.3 and 3.3) in the BASCA. As shown in Definition 3,
where
Line 3.3 calculates
Thus, the time complexity of the BASCA is
The time complexity of Lines 3.4–3.4 is
The time complexity of Line 3.4 is
The time complexity of Lines 3.4–3.4 is
The time complexity of Lines 3.4–3.4 is
The time complexity of the FPQMCA is
Equation (37) is the recursion formula of
Let
Plugging Eq. (41) into Eq. (38),
The time complexity of the FPQMCA is
The time complexity of Lines 3.5–3.5 is
The time complexity of Line 3.5 is
The time complexity of Lines 3.5–3.5 is
The time complexity of Lines 3.5–3.5 is
The time complexity of the MURIA is
Equation (48) is the recursion formula of
The time complexity of Lines 3.6–3.6 is
The time complexity of Lines 3.6–3.6 is
The time complexity of Lines 3.6–3.6 and 3.6–3.6 is
Thus, the time complexity of the MEA is
The time complexity of the FPQM is
Footnotes
Acknowledgments
This work is supported by the National High Technology Research and Development Program (“863” Program) of China under Grant No. 2015AA016009 and the National Natural Science Foundation of China under Grant No. 61232005. The authors wish to thank Lei Yang from the Lime Family Company, who provided us with the evaluation data for the elderly individuals. The data presented herein were only used for academic research. The customer ids have been anonymized, and the privacy of the elderly will not be invaded.
