Abstract
Introduction
Recent years have seen the increased adoption of smartphones, a type of handheld computer, by healthcare professionals as well as by the general public. 1 The number of smartphone users who start and finish their day with their smartphone is growing every day. As of 2013, it was estimated that 85 billion applications (apps) would be downloaded. 2 Smartphone app downloads are growing rapidly, resulting in nearly 1 billion apps downloaded monthly. 3 Among them, many are related to healthcare. 4
Various types of healthcare smartphone apps are available, including not only apps to manage or monitor one's blood glucose level or blood pressure, exercise, or diet, but also apps for cancer patients or patients suffering from chronic diseases, apps specifically for female healthcare, or even those enabling the keeping of personal health records. 5,6
However, many of the users of these healthcare apps reportedly delete the apps soon after installing them due to a lack of content or inconvenience when using them. 5 Various healthcare apps are produced without validation of their efficiency, quality, or accuracy. 7
An evaluation tool in this area would be an essential guideline for evaluating smartphone contents systematically. It is important to derive evaluating factors from evaluation tools on healthcare Web sites based on the distinct characteristics of healthcare apps, as smartphone apps can be regarded as a downscaled version of Web sites. 8 Thus far, studies that evaluate healthcare smartphone apps are in their early stages. Related studies mostly focus on recent trends or on issues related to usability, safety, or acceptability. However, an evaluation tool for healthcare smartphone apps should be developed systematically. In this context, the purpose of this study is to develop and test an evaluation tool for healthcare smartphone apps and to assess its validity and reliability.
Materials and Methods
Research Design
This study is a descriptive and psychometric assessment of a new evaluation tool that has been developed and verified for use in assessing the utility of healthcare smartphone apps.
Research Measures
Developing a provisional version of a healthcare smartphone app evaluation tool
The evaluating factors were derived from an analysis of previous studies. 9 –13 The derived factors were all noted commonly in previous studies, thought to be important regarding the reliability of health information and consumer satisfaction, or considered for inclusion for other reasons to evaluate healthcare smartphone apps effectively. Four evaluating factors were the contents aspects, community aspects, design aspects, and technological aspects.
The subevaluating factors were derived from an analysis of previous studies related to the four evaluating factors. 9 –16 The contents aspects consisted of accuracy, understandability, and objectivity. Communication aspects were made up of reactivity and participation. Design aspects consisted of consistency, the suitability of the design, and the suitability of the vocabulary. Technological aspects were the security and stability of the system.
Verifying the content validity by five experts
Five professional evaluators, all of whom are experts in the fields of medical informatics, nursing informatics, and healthcare business, verified the content validity of the provisional version of the healthcare smartphone app evaluation tool. The number of experts was determined based on previous research that reported that it was desirable for more than 3 and fewer than 10 experts to verify content validity. 17 An analysis of evaluating items for healthcare smartphone apps was conducted in order to verify the suitability of the evaluating factors, subevaluating factors, and evaluating items. For a quantitative evaluation of the content validity, the content validity index (CVI) and a 4-point Likert scale (1=“unrelated,” 2=“cannot evaluate the relativity without editing the item,” 3=“related but editing is needed,” and 4=“highly related and described concisely”) were used. 18 The five experts answered closed-ended questions.
Verifying the construct validity and reliability
App users evaluated the validity and reliability to verify the construct validity and reliability of the provisional version of the healthcare smartphone app evaluation tool modified and edited according to the results of the experts' evaluations. In total, 200 responses were collected using a convenience sampling method. The number of respondents was determined based on a previous study that suggested that the proper number of samples to verify the final draft of a questionnaire ranges from 100 to 200 to properly represent the population. 18
Data Analysis
SPSS version 20.0 software and AMOS version 21.0 (both from IBM, Armonk, NY) were used as follows: • A frequency analysis was done to test the general characteristics of the subjects. • The CVI was calculated to determine the content validity. A CVI above 0.8 was considered to be highly valid, whereas a score below 0.5 was considered to be invalid.
18
• A confirmatory factor analysis was used to verify the construct validity. • Cronbach's alpha, which examines the interitem correlations between the items on a questionnaire, was calculated to verify the reliability.
Results
Developing The Provisional Version of The Healthcare Smartphone App Evaluation Tool
Eight previous studies were reviewed, and evaluating factors and subevaluating factors for the evaluation tool were derived from them. Also, evaluating items were derived after review of a total of 18 previous studies. As a result of this process, the first version of the healthcare smartphone app evaluation tool had 35 evaluating items, 4 evaluating factors, and 10 subevaluating factors.
Verifying The Content Validity of The Five Experts
The provisional version of the healthcare smartphone app evaluation tool was modified and edited by the five experts. These experts were asked to review and edit the first version of the healthcare smartphone app evaluation tool. As a result, they removed 2 evaluating items and left 33 items. Also, subevaluating factors consisted of three questions on accuracy, four on understandability, four on objectivity, two on reactivity, two on participation, three on consistency, six on the suitability of the design, three on the accuracy of the vocabulary, four on security, and two on the stability of the system.
Verifying The Construct Validity and Reliability
From September 25 to October 4, 2013, in total, 200 undergraduates and graduate students whose majors were nursing science or medicine participated in the verification of the construct validity and reliability of the tool. The actions of distributing and collecting the 200 questionnaires were done by means of a personal visit by the researcher of the present study. All 200 questionnaires were analyzed by a data cleaning process.
Characteristics of the general evaluators
The 200 evaluators were made up of 11 (5.5%) male respondents and 189 (94.5%) female respondents. iOS (Apple, Cupertino, CA) users numbered 45 (22.5%), whereas Android™ (Google, Mountain View, CA) users numbered 155 (77.5%). With regard to the level of education of the evaluators, the respondents were undergraduate or graduate students who were nursing or medical students. All respondents had to have a strong medical background and be knowledgeable of medical content.
Verifying the construct validity
A confirmatory factor analysis was used to verify the construct validity, that is, to determine if the 33 evaluating items were properly developed according to the four evaluating factors after verifying the construct validity.
Below are the results of the analysis: questions with negative values or those with a squared multiple correlations score below 0.4 were eliminated. A lack of fit with the model was detected; therefore, another confirmatory factor analysis was conducted to improve the model fit. As a result, the question with the lowest squared multiple correlations score was deleted (Table 1).
Results of the Confirmatory Factor Analysis
App, application; CR, composite reliability; SE, standard error.
Verifying the reliability
The reliability was verified to determine if the complete evaluation tool and each evaluation question had satisfactory interitem consistency. The reliability of all of the evaluating items was high (0.905). Also, the reliability for the contents of factor I (0.840), the interface design of factor II (0.891), and the technical reliability of factor III (0.870) were high (Table 2).
Evaluating the Reliability
Survey: Evaluation tool given to participants to rate healthcare smartphone apps
User documentation needs to be offered for the correct use of the evaluation tool for healthcare smartphone apps.
The expected primary users of this evaluation tool are healthcare professionals. The purpose of this study was to develop and objectively evaluate the evaluation tool with regard to various aspects of healthcare smartphone apps. The scoring system and the analysis method were as follows: a 4-point Likert scale (0=“not at all,” 1=“a little,” 2=a fair amount,” and 3=“a lot”) was used, and the total score was the sum of the score of each item. The total score had a possible range of 0 to 69. From 0 to 23 meant “poor,” from 24 to 46 meant “average,” and from 47 to 69 meant “satisfactory.” A better healthcare smartphone app was given a higher scoring (Table 3).
Survey: Evaluation Tool Given to Participants for Rating Healthcare Smartphone Applications
If an evaluator put all of the items checked 0 (not at all), the total would be 0.
If an evaluator evaluated all the 23 items with 1 (a little), then the total would be 23. So 0–23 was determined as the range for “poor.” “Poor” indicates the content is often incorrect.
If an evaluator evaluated all the 23 items with 2 (a fair amount), then the total would be 46. So 24–46 was determined as the range for “average.” “Average” indicates the content was a little helpful.
If an evaluator evaluated all the 23 items with 3 (a lot), then the total would be 69. So 47–69 was determined as the range for “satisfactory.” “Satisfactory” indicates the content is helpful and assessable.
Discussion
Contents Aspects
Contents aspects consisted of the three subevaluating factors of accuracy, understandability, and objectivity, as assessed by nine evaluating items. The reliability was high (0.840). After the confirmatory factor analysis, two items were deleted: one item on accuracy, “the source of the healthcare information is clearly indicated,” and one item on understandability, “the app explained the gist of the healthcare information.” Other items apart from these two items were proven to be appropriate for evaluations of healthcare smartphone apps. The item “the source of the healthcare information is clearly indicated,” which was eliminated later due to a lack of necessity, had been developed based on the same context of an evaluating item for Web sites. The source of the knowledge provided on the healthcare smartphone apps should be clearly indicated, as unverified health information can influence the health of numerous people. 14 However, this item was eventually deleted because it failed to reflect the characteristics of smartphone apps. The item “the app explained the gist of the healthcare information” was also eliminated because the knowledge provided on healthcare smartphone apps already includes only limited and essential information due to the spatial constraints of the smartphone as a type of handheld computer.
Community Aspects
Community aspects consisted of two subevaluating factors made up of reactivity and participation. However, these two factors were both eventually eliminated, which can be understood as deemphasizing the necessity of community aspects during evaluations of healthcare smartphone apps. This inclination to treat community aspects as less important when evaluating healthcare smartphone apps is inconsistent with the findings of a survey on the current status of smartphone use in the second half of 2012, 19 which indicated that communication took second place, at 54.4%, in the app usage patterns of the smartphone users.
In this study, reactivity meant “the level of communication exchanged between users and app operators or healthcare information experts using e-mail, message boards, FAQs, etc.,” and the definition of participation was “reflecting opinions exchanged among interactive relationships such as those between the provider and the user, between the user and the administrator, and between the user and another user.” That is, communication is an essential element of community. After the content validity was reviewed, evaluating items of reactivity and participation in the community aspects were “the app has a function for the user's opinion,” “users' questions are answered promptly and appropriately inside the app,” “users make frequent use of free discussion forums, chat rooms, or message boards,” and “the app offers events which enable user participation.” These evaluating items were found to have low validity. It can be assumed that their low validity scores were due to their failure to reflect the distinct characteristics of healthcare smartphone apps precisely, as these questions were developed based on the evaluating items for healthcare information Web sites in previous studies. Therefore, it may be valuable if other community subevaluating factors that reflect the distinct characteristics of smartphone apps are proposed and developed.
Interface Design Aspects
Interface design consisted of the three subevaluating factors of consistency, the suitability of the design, and the accuracy of words, with 11 evaluating items. The item “the app design leaves a space inside the app, not packed with all of the letters” was eliminated as a suitability of design item. This eliminated item was also developed based on evaluating items for healthcare information Web sites from previous studies. On Web sites, there are naturally specific spaces, making it possible to evaluate whether the Web design uses the space to create a visually pleasant and comfortable interface. However, with regard to app interface designs, the situation appears to be different. App designs should leave a space inside the app, not packed with all of the letters despite the lack of space due to the relatively small screens of smartphones. Therefore, evaluating items should be suggested and developed again after reflecting the distinct characteristics of smartphone apps.
Technological Aspects
Technological aspects consisted of one subevaluating factor on security, with three evaluating items. All evaluation items pertaining to the stability of the system and the item “the app provides a backup and recovery function for starting over in the event of a system failure” for security were deleted.
The elimination of the stability of the system is inconsistent with the result of a previous study that asserted that an evaluation tool for educational smartphone apps should include evaluating questions on the stability of the system as subevaluating factors. According to previous work, educational smartphone apps should have a stable system. 16 It is considered that the same would apply to healthcare smartphone apps. However, two evaluating items on the stability of the system in the present study were deleted because these questions were developed based on the evaluating items for healthcare information Web sites from previous studies and were thus insufficient when used to reflect the distinct characteristics of smartphone apps. Therefore, evaluating items that can properly evaluate the stability of the system based on the distinct functions of smartphone apps should be developed after a further review of the literature.
The eliminated item, “the app provides a backup and recovery function for starting over in the event of a system failure,” for security should be modified to therefore reflect the distinct characteristics of smartphone apps.
Strengths and Weaknesses
Strengths
First, in order to evaluate healthcare smartphone apps, three types of evaluation factors (i.e., contents, interface design, and technology) should be considered at the same time. Not only one aspect but various types of aspects should be evaluated simultaneously. Second, the evaluation tool in the present study is a proper tool for evaluating healthcare smartphone apps given that it was developed through validation by an analysis of the content validity, a confirmatory factor analysis, and a reliability analysis. Third, the evaluation tool for healthcare smartphone apps developed in the present study is thought to establish the structural basis for evaluating currently used healthcare smartphone apps.
Weaknesses
One weakness was the gender breakdown of the participants. Most of them were females, which may have biased the evaluation. Another weakness is that the community aspects, excluded from the present study, should be included in additional work in this area so as to develop more appropriate evaluation items. A third weakness is related to the generalization of this healthcare management smartphone app tool: more and different classification evaluation groups are needed for further study.
Conclusions
As this study sought to develop a smartphone app evaluation tool for healthcare management, certain smartphone apps should be chosen and actually used to develop the evaluation tool itself.
Footnotes
Acknowledgments
This work was supported by National Research Foundation of Korea grant 2010-0028631 funded by the Korean Government.
Disclosure Statement
No competing financial interests exist.
