Abstract
Introduction
Playing is one of the most powerful vehicles of human learning. In fact, it is the favourite way of learning in early childhood and remains an equally meaningful and enjoyable one during adulthood (Kerr & Apter, 1991). The recent surge of interest in the past few years in video games with various educational and health foci therefore comes as no surprise. For instance, the majority of the publications with the keyword combination games* health, recorded on Web of Knowledge, were published after 2006 (1890 publications accounting for 87% from all publications since 1980). Among the attractive features of this learning mode are those associated with classic games like: entertainment/fun (Baranowsky, Buday, Thompson & Baranowsky, 2008; Presnky, 2005), the presence of challenge, interactivity (Salen & Zimmerman, 2005; Wattanasoontorn, Boada, Garcia & Sbert, 2013), feedback and purpose (Hsu, Lee, & Wu, 2005; Wattamasoontorn et al., 2013). What serious games (SGs) have beyond these features is their explicit/implicit learning objective added to classic games’ mere entertainment one (Miller, Chang, Wang, Beier & Klisch, 2011). This promising mix of delivery ingredients gave rise to serious games applied to health training and health promotion such as training to avoid medical errors (Graafland, Schraagen, & Schijven, 2012, for a review), games aimed at promoting health behaviours (e.g. Fuchslocher, Niesenhaus, & Krämer, 2011; Tortolero et al., 2010) as well as gaming programs for rehabilitation, including the training of motor functions (Bonnechère, Jansen, Omelina, & Van Sint Jan, 2016).
Whilst serious games differ from entertainment ones in their purpose to educate and enhance various skills acquisition, they also differ from computer-delivered interventions in their immersive features and enjoyment, which makes them more motivating than the latter. In the context of health promotion, behavioral change is facilitated through provision of health information, provision of lifestyle skills, creation of opportunities to practice the behavior and behavioral reinforcement via the game (DeSmet et al., 2014). However, the game features and what is the optimal mix of features facilitating these behavioral change processes are issues still largely unknown.
Serious games for motor rehabilitation
In 2014, 15% of the world’s population had some form of disability and a total of 2–4% from those suffering experienced significant difficulties in daily functioning. After musculoskeletal conditions such as back pain and arthritis, the most prevalent ones are degenerative and traumatic motor conditions such as stroke, traumatic brain and spinal cord injuries together with limb loss and multiple sclerosis (Ma, Chan & Carruthers, 2014).
In motor rehabilitation, SGs have become highly interesting because of the major concern that patients were not meeting standards of practicing movements that would allow neuroplastic adaptations underlying behavioral improvement (Lang, Macdonald, Reisman et al. 2009; Lohse, Shirzad, Verster, Hodges & Van der Loos, 2013). This happens because motor (re)learning is a complex sequence of transitions from declarative to procedural knowledge (Wouters et al., 2009) and from the cognitive, to associative, and to autonomous movement (Fitts & Posner, 1967). Only after extensive practice, the performer reaches the autonomous phase, where movements appear fluent and rather effortless, with a small error rate (Adams, 1971; Wulf, 2007). Not only repetition produces motor learning but it also leads to incremental task/goal performance (Holden, 2005). This may be difficult to tolerate and may result in patients becoming frustrated, tired and less motivated, with further negative effects on engagement and on speed and quality of recovery (Dobkin, 2005; Hocine, Gouaich, & Cerri, 2014). To enhance motor learning and maintain motivation, game developers focused on person-game environment interaction modes which would resemble real-life, by incorporating features that allow detection and mirroring the natural body movements (de Souza, Gadelha, Coutinho, dos Santos, Pantoja & Pereira, 2012). This creates a feeling of immersion into the virtual environment, which, together with carefully designed features, such as story genre, fantasy, design, and rewarding character potentially influences cognitive processes such as interest and motivation. These, in turn, reinforce practice and motor learning (Baranowskyet al., 2008; Girard et al., 2012; Wouters et al., 2013).
Still, the growing interest for SGs in rehabilitation has not been doubled by an equivalent rise in interest for developing games with sound theoretical bases (de Smet et al., 2014). This may account for sizeable variations and ambiguity in their reported effectiveness (Sitzmann, 2011, Wouters, van Nimwegen, van Oostendorp & van der Speck, 2013). For instance, there are some reviews arguing for increased effectiveness of SGs in motor skills recovery, especially after stroke (e.g. Holden, 2005; Saposnik & Levin, 2011). Others show that data are still inconclusive in neurohabilitation due to poor quality of evidence, variability of studies, and strong positive publication biases (Wiemeyer, 2014). Moreover, while they may be helpful as delivery tools of teaching, there is insufficient research showing that they would also have direct effects on learning outcomes (Girard, Ecalle & Magnan, 2012). Lastly, many claims of effectiveness are largely supported by narrative reports rather than quantitative evidence, with the bulk of research consisting of pilot and case studies and sparse randomized trials (Wiemeyer, 2014).
The present study comes to fill in these gaps in a two-folded way. First, we aimed to test the effectiveness of physical rehabilitation games on improving motor functions, with a focus on upper limb recovery and balance/movement following various traumatic or degenerative brain damage. Secondly, we assessed different study characteristics and game features (feedback, activities, characters, background) which might contribute to SG’s effectiveness in motor rehabilitation. Pinpointing these ingredients will be informative and a first step to designing evidence-based SGs.
Methods
Search strategy and study eligibility
We searched for clinical trials and prospective studies that described the use of games in motor rehabilitation, with no time interval constraints in Science Direct, ProQuest Central, EBSCO, Sage, Springer, PubMed, Scopus, Web of Science, PsycArticles and Cochrane Registry of Controlled Trials. The computer term searches combined terms related to “game”, “virtual environment”, “motor rehabilitation”, “humans”, “adults” (MeSH terms) filtered by design type, year of publication and English language, with search spanning up to March 2016. The search was independently performed by the first two authors and was supplemented by screening the reference lists of identified papers and by using the approach recommended by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The inclusion criteria were: a) studies were either one group pre-test and post-test (no randomization) OR pre-test post-test with equivalent control groups OR post-test only with equivalent control group OR randomized control trials with equivalent control group OR non-randomized trials with equivalent control group; b) all studies should have used games as interface for rehabilitation, in at least one of the groups; c) the studies focused on motor rehabilitation mainly and to subjects with motor problems; d) interventions were specifically targeted to adults (>18 years old); e) reports had enough statistical results in order to compute the effect size of the intervention; f) at least part of the computerized intervention have games features like: clear purpose, game rules, feedback/interaction with the user (Prensky, 2001). We included only published papers in scientific journals, written in English. No master/PhD thesis or grey literature were taken into consideration. We excluded studies that a) presented software that provided only instruction without interaction (substitute of an online physiotherapist); b) tested virtual environments but the game features were absent; c) provided no motor outcomes.
Data extraction and quality appraisal
Data were extracted independently by one senior researcher and 2 PhD students based on a common data template (Supplementary e-Table 1). This included study design, sample size, outcome measures/motor rehabilitation target, average age, number and percentage of females, number and duration of training sessions. For continuous outcomes, we extracted means (m) and standard deviations (SDs) from pre to posttest and mean differences, 95% CIs and P values from the between-group statistical analyses, in order to calculate effect sizes. For dichotomic outcomes, we extracted odd ratios (OR). Two researchers assessed independently the overall quality of evidence in the studies by using the Oxford Centre for Evidence-based Medicine (“OCEBM Levels of Evidence”, 2016) and resolved discrepancies after joint article review. Additional game videos and tutorials for the games presented in selected papers were searched on social media channels (YouTube) and demos were also coded starting from a common data recording scheme, according to the presence/absence and type of features like: avatar, feedback, type of framing and competition, type of characters etc. These latter coded characteristics were constructed by the first two authors and were developed to cover as many relevant game features as possible, based also on categories proposed in related SG research (Charsky, 2010; Garris, Ahlers & Driskell, 2002) (Supplementary e-Table 2). We included in the analyses both commercial games having an educational purpose besides the entertainment one, together with games developed in research laboratories aiming primarily at recovering motor functions of patients.
Data synthesis and analytic strategy
All analyses were conducted by using Comprehensive Meta-Analysis software, version 2.2.050. Since not all studies reported effect sizes, we calculated them based on the numerical available data within original research reports. We computed an average effect size whenever more than one outcome for the same category was used in a study or when the same outcome was measured with more than a single instrument. We conducted planned subgroup analyses according to the random effects model (Borenstein, Hedges & Higgins, 2016) based on instruments used to assess performance type (for upper limb performance and balance/movement), study design (between versus within subjects), agent of measurement (objective performance versus evaluator versus patient reported performance), total evidence quality score, and selected participants as well as game design features. Subgroup analyses were carried for mutually exclusive group categories, unless otherwise specified. For continuous variables we conducted meta-regressions to test associations with the effect size of interest, as indicated by a Z value and an associated p value. Because of the heterogeneity of studies and of reported data, we only included post-test measurements as indicators of effects, and no follow-ups were further tested. Sensitivity analyses and the potential for publication bias was calculated with the fail-safe N, which represents the number of studies with null effects needed to increase the p-value above the convenience threshold of p = 0.05 (Rosenthal, 1979). Funnel plot was used to complement this analysis. Heterogeneity of studies was further calculated by using I2, with values greater than 50% indicating large inconsistencies (Huedo-Medina, Sánchez-Meca, Marín-Martínez & Botella, 2006).
Results
Studies and interventions characteristics
The combined databases and hand search (in the references of reviews and selected papers) yielded 18474 abstracts, from which 137 studies underwent screening. After full-text reading and applying the inclusion and exclusion criteria, 61 studies were retained. All studies were published between 2004 and 2015 (Fig. 1). The description of these studies and the games used are presented in Supplementary e-Table 2. The drop-out rates in individual studies ranged from 0% immediately after the intervention (1/3 of the studies), (median drop-out = 4.5%).

PRISMA flowchart of the study selection process.
The majority of studies aimed to test interventions that would develop balance and movement (n = 35) using mainly mobility and balance training tasks, from which the most frequent were measured with Berg Balance Scale (BBS) (n = 21), Timed Up & Go Test (TUG) (n = 18) and Functional Reach Tasks (FR) (n = 6). Measures of gait were also common, such as 6- or 10 miles Walk Test (6MWT; 10MWT) (n = 17), Tinetti’s Mobility Assessments (POMA or TMA for gait, balance, total scores) (n = 6). Other studies tested various interventions for upper limb performance (n = 20), from which, the most common were aimed at improving motor and sensory hand functions as assessed with Fugl-Meyer Assessment (FMA) (n = 17), gross manual ability as assessed by Box and Block Test (BBT) (n = 6), and Wolf Motor Function Test (WMFT) (n = 4).
Most of the studies were carried out in clinical settings (n = 53), while only 5 followed participants in at home settings. The total duration of interventions varied from 15 minutes (Ustinova, Leonard, Cassavaugh & Ingersoll, 2011) to 2025 minutes (Daniel, 2012), with a median value of 1552.5 minutes spread across 1 (Ustinova et al., 2011) to 56 (Slijper, Svensson, Backlund, Engström & Sunnerhagen, 2014) sessions (median number of sessions was 20).
There were 1627 participants overall, with a percentage of women that varied from 14% to 83.3%. The mean age of the participants across studies was 61 years old (age range: 32 to 86) The motor conditions of participants was distributed among studies as follows: Stroke –35 studies (57.3%), Elderly falls problems –9 studies (14.7%), Multiple sclerosis –5 studies (8.1%), Acquired brain injuries –4 studies (6.6%), Parkinson Disease –3 studies (4.9%), and one study (1.6%) for each of the following: Alzheimer, Back & neck pain, Cerebral palsy, Vestibular hypofunction, Amblyopia. In the studies that reported it, the average time lapse since the development of motor impairment varied from 0.5 months (Yin, Sien, Ying, Chung & Tan May Leng, 2014) to 440.4 months (Jaume-i-Capó, Martínez-Bueso, Moyà-Alcover & Varona, 2014) (median time was19 months).
Meta-analytic findings
Meta-analytic pooling of effect sizes for all 61 studies resulted in an overall d = 0.59 (95% CI, 0.48 to 0.71, p < 0.001) corresponding to a medium effect on motor outcomes. Studies were of small to moderate heterogeneity, Q = 90.9, p < 0.001, I2= 34.0%. Sensitivity analyses did not reveal biases that would alter these effects, with analysis of overall publication bias resulting in a Z = 12.83, p < 0.001, with a fail-safe N = 2554 (studies with null findings needed for a 2-tailed p > 0.05). The inspection of funnel plot revealed no asymmetrical distribution of effects, therefore no trim and fill analyses were further conducted (Fig. 2).

Funnel plot of standard error by Cohen’s d (all studies).

Forrest plot of main effect of games on motor rehabilitation.
There were no differences in effect sizes depending on the type of rehabilitation, Q = 0.40, p = 0.52 and no evidence that longer total trainings had larger effects on motor outcomes compared shorter trainings, regardless of rehabilitation site (p = 0.05) (Table 1). Due to considerable variability in outcome measures with most being reported in one or two studies, subgroup analyses on the effects of SGs on balance and movement were performed for studies reporting outcomes measured with at least 1 of the 5 most frequently used in the overall sample: BBS, FRT, TT, TUG, MWT. The results showed small to medium effects (0.3 to 0.7) with no significant heterogeneity. The same approach was applied to upper limb interventions, with significantly higher effect sizes for interventions using FMA as outcome measure of performance, Q = 6.67, p = 0.03.
Subgroup comparisons based on study, sample, and game characteristics
Subgroup comparisons based on study, sample, and game characteristics
aStudies where the value or category of a moderator was not specified, were excluded from the analysis. bThe rehabilitation for visual performance, beck and neck, tongue movement, each one represented by only one study, were excluded from the analysis. df = degrees of freedom, 2D-bidimensional, 3D-tridimensional. BBS-Berg Balance Scale; BBT-Box and Block Test; F-M A-Fugl-Meyer Assessment; FRT-Functional Reach Total; TT-Tinetti Mobility Total Score; TUG-Timed Up & Go Test; xMWT-‘x’ Minutes Walking Test, where X can vary between 6 and 10; WMFT-Wolf Motor Function Test.
The analysis of the effects of serious games as a function of the agent that reported the outcomes (patient vs. therapist vs. objective measurement) revealed no differences, with small variability across studies Q = 3.6, p = 0.16. The overall quality of studies had no differential effects on outcomes of motor rehabilitation, Q = 5.4, p = 0.143, but lower quality studies generally reported larger effect sizes on motor indices of rehabilitation compared to higher quality ones.
Although the population included in the studies was rather old, M age range (32 to 86 years old), it was not significantly associated with motor outcomes, slope B =-0.001, p = 0.776). However, the proportion of women was associated with effectiveness of interventions, with studies having more women reporting larger effects on motor outcomes, slope B = 0.008, p < 0.001 (Table 1).
Subgroup analyses of SGs effectiveness by games’ characteristics
The results of these analyses are presented in Table 1 and are organized in categories corresponding to various SGs features: feedback, game activities, game characters, background. From a total of 17 moderators tested, only 2 yielded significant differences in effect sizes within groups. The first, type of activity showed better effect sizes for games encompassing both individual and group activities, with large variability across studies, Q = 5.9, p = 0.01. Related to scenario, fantasy games had the largest effect sizes followed by mixed, abstract, and realistic scenarios respectively, with large variability across studies, Q = 15.8, p = 0.001. However, these results should be interpreted with caution, as there are large differences in number of studies across moderator categories. Thus, although findings show that there is a significant moderate effect of SGs interventions on motor outcomes for upper limb and movement, there is no evidence in the data that these effects are attributable, to some extent to features of games, except for type of activity and scenarios.
Discussion
This analysis of 61 studies showed that SGs are more effective in improving motor upper limb and movement/balance functions compared to no intervention or treatment as usual, with a moderate pooled effect size. This is in line with results reported in a similar meta-analysis on effectiveness of virtual reality of post-stroke patients, showing an overall medium effect size (Lohse, Hilderman, Cheung, Tatla & Loos, 2014). Still, these findings are based on small sample studies (Laver, George, Thomas, Deutsch & Crotty, 2015) with a maximum of 84 participants, and included a heterogeneous pool of studies with around two thirds RCTs (n = 38), 5 CTs and 18 case series studies.
Intervention effects based on study and sample characteristics
As expected, the results show that rigorous trials with increased control over various factors affecting performance have significantly smaller effects compared to less controlled designs. Generally, the results point to a small risk of bias as assessed by the fail-safe N and funnel plot and a small to medium heterogeneity of studies included. It is worth mentioning that there were some notable sources of heterogeneity with more subtle potential influences on effect sizes, which are further acknowledged aslimitations.
The other effect sizes for subgroup analyses were largely inconsistent. There were no sizeable differences in effectiveness between interventions for upper limb performance and balance/movement, suggesting that the serious gaming approach has similar potential in both areas. Moreover, the effects did not vary significantly across agents performing measurements, although there were somewhat smaller effect sizes observed for studies reporting objective performance compared to performance reported by patients or therapists. This is not surprising, as self-report biases are documented even in video game research (Kahn, Ratan & Williams, 2014).
We found subgroup differences across outcomes in upper limb performance. Outcomes assessed with Fugl-Meyer Arm (FMA) had the largest effects compared to outcomes reported with Box and Block Test (BBT) or Wolf Motor Function Test (WMFT). There is limited evidence in the literature that would allow direct comparison with these findings, with only one recent Cochrane review (Laver et al., 2015) showing that upper limb performance was better in the studies which assessed it with FMA than in studies using composite measures. Since there were no patterns of significant heterogeneity for balance/movement and no sufficient data in published research that would allow for solid explanations, we cannot speculate as to why these differencesoccurred.
We did not find evidence that the length of training would systematically influence effectiveness, although one would expect that more time spent in training would lead to better motor outcomes. We have found only one review to compare results with, which points into a similar direction of no discernible differences in effects based on the length of motor training (Laver et al., 2015).
There was no effect of age, there was a significant heterogeneity of effect sizes as a function of proportion of women within samples. Studies including more women had higher effect sizes overall compared to those with smaller numbers of women. While an explanation goes beyond the purpose of this study, some research points that women might have higher conscientiousness levels than men (Feingold, 1994), which might translate in a higher involvement and adherence to treatment and thus better rehabilitation outcomes. This explanation is speculative at this point and further research should investigate potential gender differences when studying motor rehabilitation effects.
Intervention effects based on game characteristics
As a secondary objective we assessed different study characteristics and game features (feedback, activities, characters, background) which might contribute to SG’s effectiveness, with results showing that overall these are valuable features having medium to large effect sizes. Subgroup analyses revealed little differences across various game features with only type of activity (individual versus mixed) and realism of the scenario (abstract/fantasy/realistic/mixed) reaching the conventional threshold for significance. Games involving team activities had larger effect sizes than games with individual activities only, which is in line with the research emphasizing the importance of cooperation in performance as well as the potential of overall social support in sustaining learning and health behaviors (Wouters, van Nimwegen, van Oostendorp & van der Spek, 2013). As for the type of scenario, the results point that fantasy scenarios had the largest effect size compared to the others (realistic, abstract, and mixed). This brings further evidence for the idea that generally materials and games including fantasy features are described as more creative and fun than those not including them, because they are more likely to use characters and scenarios to foster engagement (Baranowski et al.,2008).
Limitations
Our approach has its inherent limits. One of them pertains to some unaccounted sources of heterogeneity that might influence effects. First, research varied in terms of included measurements and outcomes (for both upper limb and balance/gait/movement) which may render difficult a direct comparison between these studies. Another source of heterogeneity comes from comparing games and programs developed for rehabilitation with those testing commercial games with a wider purpose. This was addressed by further testing game features that might contribute to heterogeneity of effect sizes regardless of game category (commercial or custom for rehabilitation). There was no systematic assessment of study quality and risk of bias, which together with the small samples specific to these interventions and the heterogeneity of measured outcomes makes difficult ruling out the potential for systematic biased reporting. Still, we coded the strength of evidence from all included studies and used it in subgroup analyses, showing that effects were in the same direction regardless of whether they were from RCTs or case series designs although the magnitudes of these effects were indeed different. Also, the analyses revealed a low risk of bias and a small heterogeneity of effects, all of which point to small chances of systematic distortions of effectiveness indicators. Most studies render rather sketchy game descriptions, making coding of certain characteristics difficult. This was especially true for games used for scientific purposes, for which demos and supplementary materials were sparse, as opposed to commercial game with plenty of extra materials made available. Finally, there may be some studies which we might have overlooked despite employing systematic and comprehensive searching criteria and engaging researchers with considerable expertise in this type of research.
Conclusion
To our knowledge, this is the first study systematic investigation of SG effectiveness in different rehabilitation domains and also the first of this kind that would allow further elaboration of specific game features, potentially associated with effectiveness in recovery of motor functions. We advanced a new way of looking at existing intervention, focusing not only on the delivery mode (traditional or improved with SGs) but also on those features that would improve the experience of SGs and would enhance intervention effects. While this is the first study of this kind, it raises interesting transdisciplinary research avenues by combining science of rehabilitation with that of creating games that are not only instructive but also highly entertaining. As the burden of physical impairments is significant, these findings highlight what works best in SGs for motor rehabilitation, knowledge which should ideally be adopted and incorporated in the standard development of interactive games for these medicalconditions.
Conflict of interest
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 643535 (WOMEN-UP project). The funding source has no involvement in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.
Footnotes
Acknowledgments
The authors wish to acknowledge the WOMEN-UP consortium, especially the contributions of: Dr. Montserrat Espuña, Marteen Dicker and Dr. Mireya Fernandez for their very insightful comments on an earlier draft of this article.
