Abstract
Introduction
Increasing smartphones access has allowed for increasing development and use of smart phone applications (apps). Mobile health interventions have previously relied on voice or text-based short message services (SMS), however, the increasing availability and ease of use of apps has allowed for significant growth of smartphone apps that can be used for health behaviour change. This review considers the current body of knowledge relating to the evaluation of apps for health behaviour change. The aim of this review is to investigate approaches to the evaluation of health apps to identify any current best practice approaches.
Method
A systematic review was conducted. Data were collected and analysed in September 2016. Thirty-eight articles were identified and have been included in this review.
Results
Articles were published between 2011– 2016, and 36 were reviews or evaluations of apps related to one or more health conditions, the remaining two reported on an investigation of the usability of health apps. Studies investigated apps relating to the following areas: alcohol, asthma, breastfeeding, cancer, depression, diabetes, general health and fitness, headaches, heart disease, HIV, hypertension, iron deficiency/anaemia, low vision, mindfulness, obesity, pain, physical activity, smoking, weight management and women’s health.
Conclusion
In order to harness the potential of mobile health apps for behaviour change and health, we need better ways to assess the quality and effectiveness of apps. This review is unable to suggest a single best practice approach to evaluate mobile health apps. Few measures identified in this review included sufficient information or evaluation, leading to potentially incomplete and inaccurate information for consumers seeking the best app for their situation. This is further complicated by a lack of regulation in health promotion generally.
Introduction
The growth of smartphone ownership is expected to continue to increase in the coming years, especially for those aged over 50 years. 1 Increasing smartphone access has allowed for increasing development and use of smartphone applications, commonly referred to as ‘apps’. Apps are software that can be installed onto a mobile device, such as a smartphone or tablet, often for no or little cost. 2 Apps can be downloaded from a variety of digital marketplaces depending on the operating system of the device. The major digital marketplaces for apps are the Apple App Store (iTunes) and the Android store (Google Play), with a smaller number of apps available for other platforms including BlackBerry World and the Windows Phone Store. The number of apps created and downloaded has continued to grow since their introduction in 2008; by the middle of 2015, 100 billion apps had been downloaded from iTunes, 3 while Google Play had exceeded 25 billion app downloads. 4
In the past, mobile health interventions have typically relied on voice or text-based short message service (SMS),5–7 however, the increasing availability and ease of use of apps has allowed for a growth in smartphone apps that can be used for health behaviour change.8–10 Any person searching the iTunes or Google Play stores for a health app will see thousands of apps related to health and fitness. However, despite the growth in apps designed to assist behaviour change and to promote good health, there has been little research investigating the accuracy of the health or medical advice provided, the theoretical foundations that underpin the apps, or the effectiveness of these apps in changing behaviour or promoting health.
Part of the problem in evaluating the effectiveness or accuracy of information in apps is related to limitations in the methods used and inconsistencies in the approach to research in this area.11,12 There have been several previous systematic reviews that have investigated other aspects of the current research around mobile phone apps. Donker and colleagues 13 systematically investigated the effectiveness of mental health apps for mobile devices. Despite the small number of studies included in their review, they were able to conclude that apps have the potential to benefit those with poor mental health, but that more studies need to be conducted to really understand their usefulness. Payne and colleagues 14 undertook a systematic review that investigated the literature around app use in behaviour change, the behaviour change features, and the usefulness of apps in health behaviour change. While they found that apps were an acceptable tool in behaviour change interventions and that consumers were comfortable taking up apps, like other systematic reviews, this study called for more research into the potential of apps to change behaviour, citing a lack of large studies. In an investigation of mobile health technology (not limited to apps) Free and colleagues 15 found that text messaging services have a use in assisting patients to adhere to medication for some health conditions, but overall the findings were mixed with more research, particularly large studies, called for. The evidence around the use and usefulness of the main mobile health (mhealth) predecessor to apps, SMS health interventions, has also been systematically reviewed by Déglise and colleagues 7 who found that in general, mobile phones and SMS (either personalised or as a bulk message) can be an appropriate way to communicate with the population about health issues.
While all of these systematic reviews provide a picture of the current evidence around the use of apps for health behaviour change or disease management, none focus on the methods used to conduct the evaluations. This current research seeks to fill this gap, first by extending the work of BinDhim and colleagues, 16 through an updated search and a broader scope for inclusion, and secondly by focusing more on the actual methods for evaluation (rather than the classification undertaken by BinDhim and colleagues). 16 The aim of this review is to investigate the methods used in the evaluation of health apps and to identify any current best practice approaches.
Method and approach
Six databases were systematically searched to identify relevant publications: Academic Search Complete, CINAHL Complete, E-Journal, Inspec, MEDLINE Complete and PsycINFO. Included were studies that evaluated or reviewed apps that were published between January 2008, the year the Apple App Store was launched, and April 2015. Data were collected and analysed in September 2016. The search terms consisted of three constructs. The first was related to the type of device, with the search limited to studies focusing on mobile phones, smartphones, cell phones and tablets. The second construct was related to content of the review; search terms included health, wellbeing, preventative health, smok*, nutrition, alcohol, physical activity or mental wellbeing. The third construct related to terms that would indicate an evaluation or review, and included the search terms: analys*, assess*, evaluation, review, study and behaviour. A combination of the three constructs was used to conduct the search.
The inclusion criteria comprised of studies that evaluated mobile health apps in English, evaluations or reviews of apps targeted at consumers, evaluations or reviews of apps targeted at both consumers and health professionals, and studies that evaluated the effectiveness of mobile health apps. Excluded studies included those that evaluated mobile health apps targeted only at health professionals, formative evaluations of mobile health apps, studies of apps that were not publicly or commercially available, systematic reviews of app evaluation studies (including those of BinDhim et al. and Zapata et al.),16,17 studies that reported primarily on the validation of any mobile health app tool,18–20 and studies of apps not related to health. Articles were first screened by title and abstract based on the inclusion and exclusion criteria. The full texts of selected articles were then obtained for further assessment for final inclusion.
Quality and bias assessment checklist.
Results
General characteristics
The database search using the key search terms resulted in a total of 2835 journal articles. After reviewing the titles and abstracts, 379 were found to be duplicates, the titles of the remaining 2459 were reviewed and 117 were found to have met the initial selection criteria (see Figure 1). After examination of the full text, 38 articles fully met the inclusion criteria and were included in the final review.
Flow diagram of selection process.
Types and methods of evaluation.
Evaluation methods
In their review or evaluation of apps, studies employed at least one of four main types of appraisal. Appraisals included content analysis, theory or evidence-based appraisal, usability appraisal, and/or an appraisal of effectiveness. One study, 23 looked only at the number of apps available for each health condition and was excluded from this analysis as no app evaluation was completed. Of the remaining 36 studies, 20 involved only one approach to evaluation, 16 included two approaches to evaluation, and only one incorporated three appraches to of evaluation (see Table 2).
Twenty-two of the studies conducted a content analysis.23–46 In these studies, authors undertook a review of key features, functions, or quality of the app, or apps were simply categorised in to the condition they addressed. The second type of appraisal related to the investigation of features that were based on theory or evidence within the app; this type of appraisal was identified in 17 studies.11,27,29,31,35,36,38,40,43,44,47–53 These studies investigated strategies that were promoted by the app, identified behavioural change theories incorporated into the app, or investigated the app for the inclusion of evidence, clinical outcome or health professional involvement in developing the apps. Thirdly, 10 studies investigated the usability of apps.11,21,28,35,37,39,41,54–56 These studies reported on app design, interface, ease of use, user engagement or user experience. The final type of app appraisal was an investigation of effectiveness, identified in four studies,21,22,30,43 and was used to determine if the apps can lead to behavioural change or if there was any evidence that the apps would lead to behavioural change (see Table 2).
Four types of methods were used in the evaluations; self-developed checklists, established checklists, user feedback and matched-case control design. Self-developed checklists were the most frequently used method (n = 28). Checklists were individually developed by authors based on theory, established models, literature, practice guidelines or actual user evaluation that could be applied to apps. Self-developed checklists were used in 21 reviews categorised as content analysis, and 13 of the 17 reviews that used a theory or evidence-based approach (see Table 2). Bender and colleagues 30 and Cohn and colleagues 43 also used a self-developed checklist to investigate evidence indicating effectiveness.
Nine studies used an established checklist in their evaluation. Conroy et al. 44 used the Coventry, Aberdeen and London-Revised taxonomy (CALO-RE), 57 Direito et al., 47 Middleweerd et al., 48 and Vollmer Dahlke and colleagues 50 used the earlier Abraham and Michie 58 taxonomy of health behaviour change, while Morrissey and colleagues 53 used a longer taxonomy, also designed by Michie and colleagues 59 to investigate any behaviour change features. Each of these studies reported low scores on these scales, indicating that behaviour change techniques are infrequently used in health apps. Patel et al., 25 and Bardus et al. 45 used the more recent Mobile App Rating Scale (MARS) designed by Stoyanov and colleagues to assess quality. 20 Wearing and colleagues 51 employed the Flesch-Kincaid Grade Level readability formula for content analysis of paediatric obesity apps. Finally, the Nielson and Molich 60 usability heuristics were used by Hundert and colleagues 35 to evaluate usability of headache management apps.
User feedback was also used to evaluate apps. This method typically involved examining consumer comments and ratings downloaded from app stores or obtaining feedback from actual users through focus groups, questionnaires, or interviews. This method was mainly used for usability evaluation (n = 12) but was also used by Casey and colleagues 21 to examine effectiveness. Only one study 22 used a matched-case control trial method to evaluate the effectiveness of an app (see Table 2).
Recommendations
Several studies identified areas for improvement within the apps. Weaver and colleagues, 56 reviewed 48 apps that were designed to assist users to track their blood alcohol concentration (BAC). These apps were reviewed by collecting data from a field-based study measuring participants’ gender, age, number of drinks consumed, hours spent drinking and their BAC level. This study found that most of the apps produced varied and unreliable results, and in a subsequent focus group discussion, participants stated that they were sceptical of the accuracy of the apps, particularly as most of the apps were difficult to use. Kumar and colleagues, 37 investigated hypertension apps in Google Play, and found that while 14% could transform the smartphone into a medical device to measure blood pressure or heart rate, none were approved by the US Food and Drug Administration (FDA) or validated, casting doubts on the usability and reliability of the apps. Five other studies were unable to identify any evidence, validation or health professional involvement in the development of their reviewed apps,30,35,40–42 while three studies identified limited discussion of evidence-base or professional involvement in their reviews.36–38
Discussion
This literature review examined 38 studies to identify a best practice approach to the review or evaluation of apps that may promote health or encourage behaviour change. The studies employed a variety of methods, and, overall, the quality of the studies was found to be medium to high. Studies employed at least one of four types of evaluation, each covering a different aspect of the mobile health apps. Half of the studies focused on one type of evaluation only; only one attempted to assess three aspects of the reviewed apps. Content analysis was the most frequently identified in studies, while evaluation of effectiveness was the least used.
One of the key objectives of this review was to identify any best practice approaches to reviewing apps that promote health or encourage behaviour change. However, as self-developed evaluation checklists were identified as the most frequently used method for evaluating health related apps, it is difficult to point to a best practice approach. None of the self-developed evaluation checklists were validated, and despite some studies indicating moderation between reviewers,26,27,30–32,49,51–53,61 inter-rater reliability was low in studies where it was tested,27,31,48,52,53 casting doubts on the reliability of the tool or the approach and therefore the findings of the study. The fact that self-developed evaluation checklists were widely used is further evidence of the current lack of validated evaluation tools to assess mobile health apps.
Of the four studies that used a pre-existing behaviour change instrument to evaluate the apps,44,47,48,50 three47,48,50 used the Taxonomy of Behaviour Change developed by Abraham and Michie, 58 while Morrissey and colleagues 53 used a longer taxonomy, also designed by Michie and colleagues 59 to investigate any behaviour change. This instrument has been found to be reliable in identifying behaviour change techniques in interventions. The other study 44 used the CALO-RE instrument 57 an extension of the Abraham and Michie 58 taxonomy. As it currently stands, these taxonomies are the most reliable and most commonly used instruments for assessing intervention content around behaviour change.
Two studies used the new MARS. 20 This scale has been designed to be a reliable, multidimensional measure for trialling, classifying, and rating the quality of mobile health apps. This review found two studies, Patel et al. 25 and Bardus et al., 45 to have used this tool in their assessment of apps.
One of the challenges identified by this review was the appropriateness of the models used in the studies reviewed to evaluate the apps. Only four studies incorporated questions evaluating the specific mobile technology features in their checklist, suggesting that the other studies may not have taken the mobile environment into consideration. This is important as current theories and checklists may be inadequate in informing mobile interventions, a consideration made by Riley and colleagues, 63 who have advocated for the development of new theories to capture the unique capabilities of mobile interventions. Furthermore, none of the studies examined whether the behaviour change or health promoting features incorporated specifically into the apps were appropriate for the mobile environment, or if they could be effective when used in as app. If the features are found to be ineffective, it is unlikely that the app will facilitate successful behaviour change, highlighting this an important area for future research.
Another area of concern for those interested in the behaviour change opportunities present in apps is the lack of information in any of these studies about readability, privacy or security. Previous research has highlighted the importance of privacy and security when seeking to promote behaviour change through mobile apps.64,65 There is some evidence suggesting that consumers believe that their right to privacy is lost when engaging in the mobile environment, yet Internet users are also often concerned about how their personal information is used. 66 The study by Chomutare and colleagues 32 was the only study to acknowledge this gap in their evaluation, highlighting that their study of diabetes apps did not look at privacy or security at all. There is also an emerging body of evidence that points to readability as an important aspect of health and behaviour change technology in the online space, 67 with studies suggesting that readability of online health information is above the average reading ability of consumers.68,69 Concerns have been raised that information presented in these forms may be misunderstood and possibly cause harm. 70 Of the 33 studies included in this review, only Wearing and colleagues 51 evaluated the readability of the app content, however, they indicated that this was included as an afterthought and not all their reviewed apps had undergone the readability test.
There are some clear limitations identified in the studies reviewed. Only 17 of the studies actually downloaded the apps for review. Two studies indicated that their self-developed evaluation checklists were not validated, and reliability could not be ascertained,41,47 or criteria of the questionnaire could be affected by the subjectivity of the evaluator. 62 Abroms and colleagues 27 acknowledged that their guidelines were developed for a clinical setting and only assumed to be effective in a mobile app context and further research would be needed to confirm such assumptions. Middelweerd and colleagues 48 also pointed out that using Abraham and Michie’s taxonomy forced evaluators to translate strategies, originally designed for other behaviour change interventions, into app features and might lead to different interpretations among researchers, resulting in the low inter-rater reliability in their study. Five studies pointed out that the behavioural change techniques or strategies found should be interpreted with caution, and that in their study they could not determine the effectiveness of these features.11,29,31,47,50
Limitations
There are some limitations of this review that should also be acknowledged. While every attempt was made to ensure this literature review was comprehensive taking in all the available literature, additional articles may have been missed. However, given that the other two systematic reviews of evaluation method found during the search had included 10 or 22 studies,16,17 and this review found 33, the authors are confident that there is little information that is not presented here. Given the variety of different apps under study, there may also be some difficulty in making direct comparisons across the different areas of health behaviour change. However, as this is not a meta-analysis, the authors do not feel this should invalidate the findings. Three studies about specific evaluation tools were also identified, but are not presented here as they did not fit in the inclusion criteria. Two focused on usability;18,19 the other study was a validation of the 23-item tool, the MARS 20 which is focussed on engagement, functionality, aesthetics and information quality.
Conclusion
In order to harness the potential of mobile health apps for behaviour change and health, we need better ways to assess the quality and effectiveness of apps. This review is unable to suggest a single best practice approach to evaluate mobile health apps. Few measures identified in this review included sufficient information or evaluation, leading to potentially incomplete and inaccurate information for consumers seeking the best app for their situation. This is further complicated by a lack of regulation in health promotion generally.
For those seeking to complete a review of behaviour change and health promoting apps, we suggests the inclusion of three components: (a) a review of usability and functionality, (b) some critique of the apps potential to promote behaviour change, and (c) the quality of the health-related content within the apps. We were unable to find a single study or evaluation tool incorporating these three components. Such a tool would assist consumers in identifying high quality and effective health apps.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: They wish to acknowledge funding from the Victorian Health Promotion Foundation that was used to complete this work.
