Development and Evaluation of an Evaluation Tool for Healthcare Smartphone Applications

Abstract

Introduction: Various types of healthcare smartphone applications (apps) have been released in recent years, making it possible for people to manage their health anytime and anywhere. As a healthcare provider, who has the responsibility to provide guidance as to which apps can be used? The purpose of this study was to develop and evaluate an evaluation tool for the various aspects of healthcare smartphone apps. Materials and Methods: In the first phase, a provisional version of an evaluation tool for healthcare smartphone apps was developed from a review of previous studies. In the second phase, the provisional tool was modified and edited after verification by five experts with regard to its content validity. In the third phase, from September 25 to October 4, 2013, 200 responses were collected to verify the construct validity and reliability of the tool. Results: The edited tool had 23 evaluating items with three evaluating factors along with seven subevaluating factors as a result of confirmatory factor analysis. The reliability was found to be high (0.905). Conclusions: This study is meaningful because it demonstrates a healthcare smartphone app evaluation tool that is proven in terms of its validity and reliability. The evaluation tool developed and tested in this study is an appropriate and widely applicable tool with which to evaluate healthcare smartphone apps to determine if they are reliable and useful. However, this evaluation tool represents the beginning of the research in this area.

Introduction

Recent years have seen the increased adoption of smartphones, a type of handheld computer, by healthcare professionals as well as by the general public.¹ The number of smartphone users who start and finish their day with their smartphone is growing every day. As of 2013, it was estimated that 85 billion applications (apps) would be downloaded.² Smartphone app downloads are growing rapidly, resulting in nearly 1 billion apps downloaded monthly.³ Among them, many are related to healthcare.⁴

Various types of healthcare smartphone apps are available, including not only apps to manage or monitor one's blood glucose level or blood pressure, exercise, or diet, but also apps for cancer patients or patients suffering from chronic diseases, apps specifically for female healthcare, or even those enabling the keeping of personal health records.^5,6

However, many of the users of these healthcare apps reportedly delete the apps soon after installing them due to a lack of content or inconvenience when using them.⁵ Various healthcare apps are produced without validation of their efficiency, quality, or accuracy.⁷

An evaluation tool in this area would be an essential guideline for evaluating smartphone contents systematically. It is important to derive evaluating factors from evaluation tools on healthcare Web sites based on the distinct characteristics of healthcare apps, as smartphone apps can be regarded as a downscaled version of Web sites.⁸ Thus far, studies that evaluate healthcare smartphone apps are in their early stages. Related studies mostly focus on recent trends or on issues related to usability, safety, or acceptability. However, an evaluation tool for healthcare smartphone apps should be developed systematically. In this context, the purpose of this study is to develop and test an evaluation tool for healthcare smartphone apps and to assess its validity and reliability.

Materials and Methods

Research Design

This study is a descriptive and psychometric assessment of a new evaluation tool that has been developed and verified for use in assessing the utility of healthcare smartphone apps.

Research Measures

Developing a provisional version of a healthcare smartphone app evaluation tool

The evaluating factors were derived from an analysis of previous studies.^9

–13 The derived factors were all noted commonly in previous studies, thought to be important regarding the reliability of health information and consumer satisfaction, or considered for inclusion for other reasons to evaluate healthcare smartphone apps effectively. Four evaluating factors were the contents aspects, community aspects, design aspects, and technological aspects.

The subevaluating factors were derived from an analysis of previous studies related to the four evaluating factors.^9

–16 The contents aspects consisted of accuracy, understandability, and objectivity. Communication aspects were made up of reactivity and participation. Design aspects consisted of consistency, the suitability of the design, and the suitability of the vocabulary. Technological aspects were the security and stability of the system.

Verifying the content validity by five experts

Five professional evaluators, all of whom are experts in the fields of medical informatics, nursing informatics, and healthcare business, verified the content validity of the provisional version of the healthcare smartphone app evaluation tool. The number of experts was determined based on previous research that reported that it was desirable for more than 3 and fewer than 10 experts to verify content validity.¹⁷ An analysis of evaluating items for healthcare smartphone apps was conducted in order to verify the suitability of the evaluating factors, subevaluating factors, and evaluating items. For a quantitative evaluation of the content validity, the content validity index (CVI) and a 4-point Likert scale (1=“unrelated,” 2=“cannot evaluate the relativity without editing the item,” 3=“related but editing is needed,” and 4=“highly related and described concisely”) were used.¹⁸ The five experts answered closed-ended questions.

Verifying the construct validity and reliability

App users evaluated the validity and reliability to verify the construct validity and reliability of the provisional version of the healthcare smartphone app evaluation tool modified and edited according to the results of the experts' evaluations. In total, 200 responses were collected using a convenience sampling method. The number of respondents was determined based on a previous study that suggested that the proper number of samples to verify the final draft of a questionnaire ranges from 100 to 200 to properly represent the population.¹⁸

Data Analysis

SPSS version 20.0 software and AMOS version 21.0 (both from IBM, Armonk, NY) were used as follows:

• A frequency analysis was done to test the general characteristics of the subjects.

• The CVI was calculated to determine the content validity. A CVI above 0.8 was considered to be highly valid, whereas a score below 0.5 was considered to be invalid.¹⁸

• A confirmatory factor analysis was used to verify the construct validity.

• Cronbach's alpha, which examines the interitem correlations between the items on a questionnaire, was calculated to verify the reliability.

Results

Developing The Provisional Version of The Healthcare Smartphone App Evaluation Tool

Eight previous studies were reviewed, and evaluating factors and subevaluating factors for the evaluation tool were derived from them. Also, evaluating items were derived after review of a total of 18 previous studies. As a result of this process, the first version of the healthcare smartphone app evaluation tool had 35 evaluating items, 4 evaluating factors, and 10 subevaluating factors.

Verifying The Content Validity of The Five Experts

The provisional version of the healthcare smartphone app evaluation tool was modified and edited by the five experts. These experts were asked to review and edit the first version of the healthcare smartphone app evaluation tool. As a result, they removed 2 evaluating items and left 33 items. Also, subevaluating factors consisted of three questions on accuracy, four on understandability, four on objectivity, two on reactivity, two on participation, three on consistency, six on the suitability of the design, three on the accuracy of the vocabulary, four on security, and two on the stability of the system.

Verifying The Construct Validity and Reliability

From September 25 to October 4, 2013, in total, 200 undergraduates and graduate students whose majors were nursing science or medicine participated in the verification of the construct validity and reliability of the tool. The actions of distributing and collecting the 200 questionnaires were done by means of a personal visit by the researcher of the present study. All 200 questionnaires were analyzed by a data cleaning process.

Characteristics of the general evaluators

The 200 evaluators were made up of 11 (5.5%) male respondents and 189 (94.5%) female respondents. iOS (Apple, Cupertino, CA) users numbered 45 (22.5%), whereas Android™ (Google, Mountain View, CA) users numbered 155 (77.5%). With regard to the level of education of the evaluators, the respondents were undergraduate or graduate students who were nursing or medical students. All respondents had to have a strong medical background and be knowledgeable of medical content.

Verifying the construct validity

A confirmatory factor analysis was used to verify the construct validity, that is, to determine if the 33 evaluating items were properly developed according to the four evaluating factors after verifying the construct validity.

Below are the results of the analysis: questions with negative values or those with a squared multiple correlations score below 0.4 were eliminated. A lack of fit with the model was detected; therefore, another confirmatory factor analysis was conducted to improve the model fit. As a result, the question with the lowest squared multiple correlations score was deleted (Table 1).

Table 1.

Results of the Confirmatory Factor Analysis

SUBEVALUATING FACTOR, EVALUATING ITEM	ESTIMATE	SE	CR	P VALUE
Accuracy
Information provided in the healthcare app is accurate (there is no inaccurate information).	0.662
Clear information is provided in the healthcare app.	0.739	0.134	8.574	<0.001
Understandability
The healthcare information in the app is readily understandable.	0.504
The healthcare information in the app is explained in everyday terms.	0.561	0.152	8.158	<0.001
People in general can easily read the healthcare information provided in the app.	0.517	0.138	8.033	<0.001
Objectivity
Professional healthcare information is provided.	0.638	0.109	9.602	<0.001
Healthcare information is provided systematically.	0.680	0.115	9.817	<0.001
There is an indication that the healthcare information is cited from authoritative sources.	0.517	0.129	8.823	<0.001
Medical experts provide the healthcare information.	0.459
Consistency
The app has coherence in terms of color, configuration, and expression method.	0.655	0.085	12.287	<0.001
Icon arrangement is in harmony with the whole app design.	0.635	0.079	12.089	<0.001
Icons are categorized coherently in the app.	0.705
Suitability of design
Arrangement of contents is well organized enough to be sequentially accessible and logically understandable.	0.512	0.098	9.252	<0.001
The meaning of each icon is clearly expressed.	0.502	0.104	9.166	<0.001
The app has highly readable typography.	0.570	0.109	9.723	<0.001
Visual elements do not confuse users.	0.702	0.103	10.644	<0.001
The structure of the app can be clearly grasped.	0.490
Accuracy of wording
Instructions are told in a concise manner.	0.648	0.111	9.858	<0.001
Instructions are told in a precise manner.	0.877	0.122	10.232	<0.001
All words are not merely spelled correctly but also grammatically correct.	0.440
Security
The app offers information about privacy protection.	0.778	0.123	10.303	<0.001
The app offers information about security policies related to personal health information.	0.838	0.125	10.488	<0.001
The app explained the security system for creating a safe environment for better mobile app usage.	0.532	0.123	8.923	<0.001

App, application; CR, composite reliability; SE, standard error.

Verifying the reliability

The reliability was verified to determine if the complete evaluation tool and each evaluation question had satisfactory interitem consistency. The reliability of all of the evaluating items was high (0.905). Also, the reliability for the contents of factor I (0.840), the interface design of factor II (0.891), and the technical reliability of factor III (0.870) were high (Table 2).

Table 2.

Evaluating the Reliability

EVALUATING FACTOR, SUBEVALUATING FACTOR	EVALUATING ITEM	CRONBACH'S ALPHA
Factor I (contents)				0.905
Accuracy	2	0.822
Understandability	3	0.768	0.840
Objectivity	4	0.838
Factor II (interface design)
Consistency	3	0.857
Suitability of design	5	0.859	0.891
Accuracy of wording	3	0.834
Factor II (technology)
Security	3	0.870	0.870

Survey: Evaluation tool given to participants to rate healthcare smartphone apps

User documentation needs to be offered for the correct use of the evaluation tool for healthcare smartphone apps.

The expected primary users of this evaluation tool are healthcare professionals. The purpose of this study was to develop and objectively evaluate the evaluation tool with regard to various aspects of healthcare smartphone apps. The scoring system and the analysis method were as follows: a 4-point Likert scale (0=“not at all,” 1=“a little,” 2=a fair amount,” and 3=“a lot”) was used, and the total score was the sum of the score of each item. The total score had a possible range of 0 to 69. From 0 to 23 meant “poor,” from 24 to 46 meant “average,” and from 47 to 69 meant “satisfactory.” A better healthcare smartphone app was given a higher scoring (Table 3).

Table 3.

Survey: Evaluation Tool Given to Participants for Rating Healthcare Smartphone Applications

EVALUATING FACTOR, SUBEVALUATING FACTOR	EVALUATING ITEM	NOT AT ALL (0)^a	A LITTLE (1)^b	A FAIR AMOUNT (2)^c	A LOT (3)^d
Contents
Accuracy
	Information provided in the healthcare app is accurate (there is no inaccurate information).
	Clear information is provided in the healthcare app.
Understandability
	The healthcare information in the app is readily understandable.
	The healthcare information in the app is explained in everyday terms.
	People in general can easily read the healthcare information provided in the app.
Objectivity
	Professional healthcare information is provided.
	Healthcare information is provided systematically.
	There is an indication that the healthcare information is cited from authoritative sources.
	Medical experts provide the healthcare information.
Interface design
Consistency
	The app has coherence in terms of color, configuration, and expression method.
	Icon arrangement is in harmony with the whole app design.
	Icons are categorized coherently in the app.
Suitability of design
	Arrangement of contents is well organized enough to be sequentially accessible and logically understandable.
	The meaning of each icon is clearly expressed.
	The app has highly readable typography.
	Visual elements do not confuse users.
	The structure of the app can be clearly grasped.
Accuracy of wording
	Instructions are told in a concise manner.
	Instructions are told in a precise manner.
	All words are not merely spelled correctly but also grammatically correct.
Technology
Security
	The app offers information about privacy protection.
	The app offers information about security policies related to personal health information.
	The app explained the security system for creating a safe environment for better mobile app usage.

If an evaluator put all of the items checked 0 (not at all), the total would be 0.

If an evaluator evaluated all the 23 items with 1 (a little), then the total would be 23. So 0–23 was determined as the range for “poor.” “Poor” indicates the content is often incorrect.

If an evaluator evaluated all the 23 items with 2 (a fair amount), then the total would be 46. So 24–46 was determined as the range for “average.” “Average” indicates the content was a little helpful.

If an evaluator evaluated all the 23 items with 3 (a lot), then the total would be 69. So 47–69 was determined as the range for “satisfactory.” “Satisfactory” indicates the content is helpful and assessable.

Discussion

Contents Aspects

Contents aspects consisted of the three subevaluating factors of accuracy, understandability, and objectivity, as assessed by nine evaluating items. The reliability was high (0.840). After the confirmatory factor analysis, two items were deleted: one item on accuracy, “the source of the healthcare information is clearly indicated,” and one item on understandability, “the app explained the gist of the healthcare information.” Other items apart from these two items were proven to be appropriate for evaluations of healthcare smartphone apps. The item “the source of the healthcare information is clearly indicated,” which was eliminated later due to a lack of necessity, had been developed based on the same context of an evaluating item for Web sites. The source of the knowledge provided on the healthcare smartphone apps should be clearly indicated, as unverified health information can influence the health of numerous people.¹⁴ However, this item was eventually deleted because it failed to reflect the characteristics of smartphone apps. The item “the app explained the gist of the healthcare information” was also eliminated because the knowledge provided on healthcare smartphone apps already includes only limited and essential information due to the spatial constraints of the smartphone as a type of handheld computer.

Community Aspects

Community aspects consisted of two subevaluating factors made up of reactivity and participation. However, these two factors were both eventually eliminated, which can be understood as deemphasizing the necessity of community aspects during evaluations of healthcare smartphone apps. This inclination to treat community aspects as less important when evaluating healthcare smartphone apps is inconsistent with the findings of a survey on the current status of smartphone use in the second half of 2012,¹⁹ which indicated that communication took second place, at 54.4%, in the app usage patterns of the smartphone users.

In this study, reactivity meant “the level of communication exchanged between users and app operators or healthcare information experts using e-mail, message boards, FAQs, etc.,” and the definition of participation was “reflecting opinions exchanged among interactive relationships such as those between the provider and the user, between the user and the administrator, and between the user and another user.” That is, communication is an essential element of community. After the content validity was reviewed, evaluating items of reactivity and participation in the community aspects were “the app has a function for the user's opinion,” “users' questions are answered promptly and appropriately inside the app,” “users make frequent use of free discussion forums, chat rooms, or message boards,” and “the app offers events which enable user participation.” These evaluating items were found to have low validity. It can be assumed that their low validity scores were due to their failure to reflect the distinct characteristics of healthcare smartphone apps precisely, as these questions were developed based on the evaluating items for healthcare information Web sites in previous studies. Therefore, it may be valuable if other community subevaluating factors that reflect the distinct characteristics of smartphone apps are proposed and developed.

Interface Design Aspects

Interface design consisted of the three subevaluating factors of consistency, the suitability of the design, and the accuracy of words, with 11 evaluating items. The item “the app design leaves a space inside the app, not packed with all of the letters” was eliminated as a suitability of design item. This eliminated item was also developed based on evaluating items for healthcare information Web sites from previous studies. On Web sites, there are naturally specific spaces, making it possible to evaluate whether the Web design uses the space to create a visually pleasant and comfortable interface. However, with regard to app interface designs, the situation appears to be different. App designs should leave a space inside the app, not packed with all of the letters despite the lack of space due to the relatively small screens of smartphones. Therefore, evaluating items should be suggested and developed again after reflecting the distinct characteristics of smartphone apps.

Technological Aspects

Technological aspects consisted of one subevaluating factor on security, with three evaluating items. All evaluation items pertaining to the stability of the system and the item “the app provides a backup and recovery function for starting over in the event of a system failure” for security were deleted.

The elimination of the stability of the system is inconsistent with the result of a previous study that asserted that an evaluation tool for educational smartphone apps should include evaluating questions on the stability of the system as subevaluating factors. According to previous work, educational smartphone apps should have a stable system.¹⁶ It is considered that the same would apply to healthcare smartphone apps. However, two evaluating items on the stability of the system in the present study were deleted because these questions were developed based on the evaluating items for healthcare information Web sites from previous studies and were thus insufficient when used to reflect the distinct characteristics of smartphone apps. Therefore, evaluating items that can properly evaluate the stability of the system based on the distinct functions of smartphone apps should be developed after a further review of the literature.

The eliminated item, “the app provides a backup and recovery function for starting over in the event of a system failure,” for security should be modified to therefore reflect the distinct characteristics of smartphone apps.

Strengths and Weaknesses

Strengths

First, in order to evaluate healthcare smartphone apps, three types of evaluation factors (i.e., contents, interface design, and technology) should be considered at the same time. Not only one aspect but various types of aspects should be evaluated simultaneously. Second, the evaluation tool in the present study is a proper tool for evaluating healthcare smartphone apps given that it was developed through validation by an analysis of the content validity, a confirmatory factor analysis, and a reliability analysis. Third, the evaluation tool for healthcare smartphone apps developed in the present study is thought to establish the structural basis for evaluating currently used healthcare smartphone apps.

Weaknesses

One weakness was the gender breakdown of the participants. Most of them were females, which may have biased the evaluation. Another weakness is that the community aspects, excluded from the present study, should be included in additional work in this area so as to develop more appropriate evaluation items. A third weakness is related to the generalization of this healthcare management smartphone app tool: more and different classification evaluation groups are needed for further study.

Conclusions

As this study sought to develop a smartphone app evaluation tool for healthcare management, certain smartphone apps should be chosen and actually used to develop the evaluation tool itself.

Footnotes

Acknowledgments

This work was supported by National Research Foundation of Korea grant 2010-0028631 funded by the Korean Government.

Disclosure Statement

No competing financial interests exist.

References

Mosa

, Yoo

, Sheets

. A systematic review of healthcare applications for smartphones. BMC Med Inform Decis Mak, 2012; 12:67.

Portio Research. 2013. Available at www.portioresearch.com/en/major-reports/current-portfolio/mobile-applications-futures-2013-2017.aspx (last accessed October 10, 2014 ).

Tech Computer Science. Google Play reaches 15 billion downloads. 2012. Available at http://techcomputerscience.com/blog/2012/05/07/google-play-reaches-15-billion-downloads/ (last accessed March 5, 2013 ).

AppBrain. Most popular Android market categories. 2012. Available at www.appbrain.com/stats/android-market-app-categories (last accessed March 5, 2013 ).

Wang

, Park

, Choi

. Acceptance of applications for smartphones healthcare key influencers. Korea Contents Soc, 2011; 11:396–404.

Shim

. Factors related to the intent to use the medical application (M-APP) smart phone of hospital employees [Master's dissertation]. Wonju, Korea: Yonsei University, 2011.

Bindihm

, Freeman

, Trevena

. Pro-smoking apps for smartphones: The latest vehicle for the tobacco industry?. Tob Control, 2014; 23:e4.

Kapps. 2011. Available at http://kapps.co.kr/bbs/board.php? botable=m52&wr_id=32 (last accessed November 13, 2012 ).

Hong

. A mechanism for the evaluation of Internet shopping malls, based on the 3C-D-T framework. Chung-Ang Manage Rev, 2002; 28:255–270.

10.

Kim

. Study on investigative driving an evaluation model for Internet website. Manage Inf Syst Rev, 2002; 9:117–137.

11.

Choi

. Study on the evaluation model of the web sites [Master's dissertation]. Seoul: Chung-Ang University, 2007.

12.

Lee

. The evaluation and impact factors of website satisfaction of university libraries: Focused on the Kangwon Province areas [Master's dissertation]. Chuncheon, Korea: Kangwon National University, 2008.

13.

Lee

. Case study for identifying user's needs and improving college website [Doctoral dissertation]. Seoul: Hanyang University, 2010.

14.

Jeong

, Park

. Development of health information on the Internet rating system. J Korean Soc Med Inform, 2000; 6:53–66.

15.

Lee

. Study on the evaluation factors of the medical and healthcare Web sites [Master's dissertation]. Seoul: Sogang University, 2000.

16.

Lee

. Development of evaluation tool for educational applications [Master's dissertation]. Suwon, Korea: Ajou University, 2013.

17.

Lynn

. Determination and quantification of content validity. Nurs Res, 1986; 35:382–386.

18.

Lee

, Lim

, Park

. Nursing research and statistical analysis. Seoul: Soomoonsa, 2009.

19.

Korea Communications Commission. The survey on the current status of smartphone use in the second half. Available at www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&cd=3&ved=0CDcQFjAC&url=http%3A%2F%2Fisis.kisa.or.kr%2Fboard%2FfileDown.jsp%3FpageId%3D060200%26bbsId%3D3%26itemId%3D799%26athSeq%3D1&ei=OQhXVfbzBOHEmQX0n4GwDg&usg=AFQjCNEkyHO11s4QGExBpQyGGaVd-F4Z0w&sig2=DTYWkKHlQAf1zB_CC6Wv_w&cad=rjt (last accessed November 23, 2013 ).