Abstract
Physical activity (PA) is a component of total energy expenditure. PA and PA energy expenditure (PAEE) can be estimated by objective techniques (OTs). However, the use of questionnaires is frequent in clinical settings and epidemiological studies. We conducted a search on PubMed, Scopus, and Google Scholar databases to perform a review of studies reporting the reliability and validity of PA questionnaires validated against OTs—doubly labeled water (DLW) or accelerometers—in free-living adults. We selected original articles published between 2009 and 2019 that reported validation studies of PA questionnaires. We identified 53 studies that fulfilled the eligibility criteria. Four PA questionnaires were validated against DLW and the remaining against accelerometers. Three questionnaires were compared with both DLW and accelerometer results. The correlation between questionnaire-estimated PAEE and DLW results ranged from r = .22 to r = .46, while that between questionnaire-estimated total PA (TPA) and accelerometer results ranged from r = .11 to r = .54 The intraclass correlation coefficients were between .56 and .84. Despite having good reliability, most of the questionnaires included in this review have shown limited validity for estimating TPA in adults. OTs should be considered as a first option, when possible. Further research is warranted on techniques to obtain more accurate PA and PAEE estimates.
Dietary intake should provide the correct amount of energy to balance energy expenditure. In adults, daily energy requirements are equivalent to the 24-hour total energy expenditure (TEE) and include the energy costs of the basal metabolic rate (BMR), diet-induced thermogenesis (DIT), and physical activity [PA] (United Nations University & World Health Organization, 2004). The BMR accounts for 45%–70% of TEE. This component is often estimated using equations, such as those of Mifflin et al. (1990) and Harris and Benedict (1914) or those suggested by the Food and Agriculture Organization and the World Health Organization (United Nations University & World Health Organization, 2004). The second component—the DIT—is related to the amount of energy needed for food digestion, nutrient absorption, and storage/elimination, which is estimated to be around 10% of the individual BMR (United Nations University & World Health Organization, 2004). The third component is related to PA, which is highly variable within and between individuals. Therefore, determining PA energy expenditure (PAEE) and its contribution to TEE is still a challenge (European Food Safety Authority, 2013). The physical activity level (PAL)—proposed by an expert panel—allows classification of adult population lifestyles based on the description of activities in an average day (United Nations University & World Health Organization, 2004). However, there is a broad range of values within each PAL category (sedentary or light activity lifestyle: 1.40–1.69, active or moderately active lifestyle: 1.70–1.99, and vigorous or vigorously active lifestyle: 2.00–2.40).
There are objective and subjective techniques for estimating energy expenditure and PA. The doubly labeled water (DLW) technique is considered the “gold standard” for TEE estimation, but it is an expensive method with complex analysis. It requires administering heavy hydrogen (2H) and oxygen (18O) isotopes and, based on their kinetics, determining the difference of their elimination rates (Westerterp, 2017). However, this technique does not provide information on PA type, duration, and intensity (Ramuth et al., 2019). Objective techniques (OTs) to measure PA require the use of portable devices that record biosignals, such as heart rate, skin temperature, or acceleration. The use of accelerometers has increased significantly in recent years (Strath et al., 2013) possibly because of their small size and ease of use (Aparicio-Ugarriza et al., 2015). They monitor movements on different axes, providing information to derive PA frequency, duration, and intensity (Strath et al., 2013). Accelerometer results may vary due to factors such as their location site or the accuracy of the device to register water-related activities, weight-bearing, or surface inclination during walking/running (Yang & Hsu, 2010). There is also widespread use of subjective techniques based on PA self-reports, such as diaries, recalls, and questionnaires (Strath et al., 2013). Their advantages include the low cost, the low burden imposed on the subjects, and the potential for administration to large samples (Ndahimana & Kim, 2017). Questionnaires are therefore often used in clinical and nonclinical settings when OTs are not available, when a quick assessment is required, or when the condition of the subject does not allow for direct measures. Thus, the characteristics and properties of PA questionnaires and their target population should be understood before selecting them for research or nutritional counseling. We therefore aimed to review the reliability and validity of PA questionnaires that have been validated against OTs in free-living adults.
Method
An electronic search was conducted using the PubMed and Scopus databases, as well as on the Google Scholar web search engine, during January and February 2019 using the following descriptors: “physical activity questionnaire,” “validity,” and “adults.” The search term for PubMed was “physical activity questionnaire” [All Fields] AND “validity” [All Fields] AND “adult” [All Fields] AND (“2009/02/14” [PDat]: “2019/02/11” [PDat]). We selected original articles published between 2009 and 2019 in English or Spanish that reported validation studies of PA questionnaires in adult populations (18–65 years). Two researchers independently conducted an initial screening, looking at the descriptors in the title and then in the abstract, and abstracts with information on questionnaires designed for adults and validated against OTs were identified. We applied eligibility criteria and selected full-text articles that reported a questionnaire (≥2 items) validation process using DLW or accelerometers as the OTs. We excluded those validated against pedometers and those with unclear or incomplete validity information or not reporting correlation coefficients. After discussing and applying the eligibility criteria, there was no disagreement between researchers about the studies to include in this review.
Different statistical approaches were used to evaluate validity. However, we compared only the correlation coefficients (Pearson’s or Spearman’s), which were reported by the studies. The strength of the correlation was evaluated as follows: r < .30 was considered negligible or weak, r = .30–.49 was low, r = .50–.69 was moderate, r = .70–.89 was strong, and r > .90 was very strong correlation (Mukaka, 2012). Reliability was classified based on the intraclass correlation coefficient (ICC) as follows: ICC < .40 was considered poor, ICC = .40–.75 was fair to good, and ICC > .75 was excellent (Craven & Morris, 2010).
Results
The search process is summarized in Figure 1. We identified 652 matches with the descriptors: 273 in Google Scholar, 248 in the Scopus database, 126 in the PubMed database, and five in other sources. Seventy-nine articles included the descriptors in the title and/or abstract. After eliminating duplicates (n = 16), 63 studies remained. Three articles were not included because we did not have access to the full text. Articles that used pedometers as the OT (n = 2) or with insufficient validity data (n = 5) were excluded. Finally, 53 articles were included in this review. Four publications reported the validation process of more than one PA questionnaire. One study reported a simultaneous validation of seven questionnaires against DLW: the Japan Public Health Center–based Prospective Study-PAQ short and long forms (JPHC-PAQS and JPHC-PAQL, respectively), the International Physical Activity Questionnaire (IPAQ) short form, and the Global Physical Activity Questionnaire (GPAQ), plus the Japan Arteriosclerosis Longitudinal Study-PAQ, the National Integrated Project for Prospective Observation of Non-Communicable Disease Trends in the Aged 2010-PAQ, and the Jichi Medical School Cohort Study-PAQ, which were part of other instruments and were not included in this review (Sasai et al., 2018). Another study reported the validation process of three questionnaires: the Physical Activity Questionnaire for adults (PAQ-AD), the Assessment of Physical Activity Levels Questionnaire, and the IPAQ short form (Rodríguez-Muñoz et al., 2017).

Flowchart of literature search.
Table 1 shows the information about questionnaires designed for adults without diseases (n = 24), pregnant women and young mothers (n = 2), and adults with diseases or special conditions (n = 4). Studies validating either the IPAQ or the GPAQ are presented in Table 2.
Studies Evaluating Validity and Reliability of PA Questionnaires in Adults.
Note. ACC = accelerometer; BMR = basal metabolic rate; cpm = counts per minute; MPA = moderate physical activity; MVPA = moderate-to-vigorous physical activity; LPA = light physical activity; EE = energy expenditure; PAEE = physical activity energy expenditure; TEE = total energy expenditure; MET = metabolic equivalent; ICC = intraclass correlation coefficient; CI = confidence interval; WT = walking for transportation; CT = cycling for transportation; WR = walking for recreation; CR = cycling for recreation; RMR = resting metabolic rate; BMI = body mass index.
a Some studies did not report reliability. b Spearman’s correlation coefficient. c Pearson’s correlation coefficient. d Correlation coefficient. e 95% CI not reported.
Studies Reporting Validity and/or Reliability of the International Physical Activity Questionnaire (IPAQ) and the Global Physical Activity Questionnaire (GPAQ).
Note. cpm = counts per minute; MET = metabolic equivalent; BMR = basal metabolic rate; HEPA = health-enhancing aerobic physical activity; MPA = moderate physical activity; VPA = vigorous physical activity; MVPA = moderate-to-vigorous physical activity; LPA = light physical activity; TPA = total physical activity; ICC = intraclass correlation coefficient; NS = not specified; EE = energy expenditure; r = Spearman’s correlation coefficient unless otherwise specified; AEE = activity energy expenditure.
a Pearson’s correlation coefficient. b Not specified. c Raw activity data for each participant were converted to activity cpm-by-minute AEE (kcal kg-1 min-1) based on Heil’s algorithm. d ES = effect size. e Wilcoxon’s test. f Actiheart: Combined accelerometer and heart rate monitor device. g SenseWear Armband: Device that integrates information from a three-axis or two-axis accelerometer and sensors of galvanic response, skin temperature, and heat flux.
As shown in Table 1, five questionnaires were validated against DLW, asked to recall time periods from 1 month to 1 year and measured DLW for 11 (Bonn et al., 2012) to 15 days (Sasai et al., 2018). PAEE was estimated by subtracting the values of the DIT and either the BMR or the resting metabolic rate from TEE (Besson et al., 2011; Csizmadi et al., 2014; Sasai et al., 2018). The Recent Physical Activity Questionnaire (RPAQ) was also validated with Actiheart, which combines movement and heart rate sensors (Besson et al., 2011). The correlation coefficients between PA obtained from questionnaires and PAEE obtained by DLW ranged from r = .22 to r = .46, and for TEE, they ranged from r = .42 to r = .84.
Twenty-five questionnaires were validated against accelerometers, of which, one was administered to pregnant women, one to young mothers, and four to adults with diseases (chronic nonspecific low back pain or rheumatoid arthritis) or special conditions (breast cancer survivors or those in a cardiac rehabilitation program) [Table 1]. Most questionnaires (n = 19) asked for a recall of 5–7 days of PA, and, in most such studies, the participants were asked to wear the accelerometer for 4–7 days during waking hours, except for during water-related activities. The correlations between the values for PA estimates obtained by questionnaire and the accelerometer results were mostly weak or low. For total PA (TPA), they ranged from r = .24 to r = .48, except for one questionnaire that showed a moderate correlation with accelerometer measures (r = .54). The correlation coefficients were also weak or low for light, moderate, and moderate-to-vigorous PA (LPA, r = .069–.36; MPA, r = .19–.40; and MVPA, r = .08–.48, respectively). Only one study reported a moderate correlation for MVPA. For vigorous PA (VPA), the correlation values were from r = −.06 to.72. ICC values were around .7 for PAEE, .4–.8 for LPA, .68–.79 for MPA, .51–.94 for VPA, .07–.88 for MVPA, and .56–.84 for TPA. The test–retest periods were from 1 week to 3 months. Among the questionnaires designed for adults with diseases or special conditions, the correlation coefficients were .32–.46 for MVPA and .17–.48 for TPA. Regarding reliability, one study reported r=.624 for MVPA (Bähler et al., 2013) and Carvalho et al. (2017) found an ICC of .77 for TPA. The test–retest periods were from 7 to 25 days.
Table 2 shows studies validating the IPAQ and the GPAQ. The short (IPAQ-S) and long (IPAQ-L) forms of the IPAQ were validated in adults without diseases (n = 16) or with special conditions (intellectual disability [n = 1], chronic fatigue syndrome [n = 1], deafness [n = 1], post-traumatic stress disorder [n = 1], low back pain [n = 1], bipolar disorder [n = 1], HIV [n = 1], and fibromyalgia [n = 1]). Except for two studies that used DLW as the OT, the results of the IPAQ were compared with those obtained from accelerometers, of which two used the Actiheart and two used the SenseWear Armband. The correlation values between OT results and PA estimates obtained by the IPAQ-S, the IPAQ-L, or the GPAQ were weak to low for TPA (r = -.07–.38), MPA (r = -.07–.28), and MVPA (r = .02–.48), except for one study that reported a moderate correlation between the IPAQ-S and OT results for TPA (r = .5).
Discussion
The studies included in this review reported weak or low correlations between PAEE estimated by DLW and questionnaires. Similarly, the correlation coefficients between questionnaire TPA estimates and accelerometer measures were mostly weak or low, except for one study that reported a moderate correlation.
The DLW technique is considered the gold standard for TEE assessment (Westerterp, 2017) and, although it does not specifically evaluate PA, the PAEE has been estimated by subtracting the BMR and DIT energy expenditure from TEE values (Csizmadi et al., 2014; Golubic et al., 2014; Sasai et al., 2018). This technique requires expensive equipment, limiting its use to research or laboratory settings. The short form of the JPHC-PAQ (Sasai et al., 2018), the STAR-Q (Csizmadi et al., 2014), and the RPAQ (Golubic et al., 2014) showed weak or low correlations with PAEE estimated by DLW. These questionnaires showed moderate (r = .50–.69) or strong (r = .70–.89) correlation with TEE. Applying a unique TEE value—obtained under certain conditions—could result in some degree of inaccuracy in subjects with highly variable PA patterns. In those cases, estimating PA would be useful for energy adjustment.
Accelerometers are the devices most frequently used to validate PA questionnaires. Most of them showed weak or low correlations between their estimates. Only the study by Rodríguez-Muñoz et al. (2017) reported a moderate correlation between accelerometer and the APAQ or the IPAQ-S estimates. Consistent with the results of this review, Smith et al. (2017) reported a criterion validity of r = .26 for the GPAQ, while Lee et al. (2011) reported values from r = .09 to r = .39 for the correlation between TPA estimated by the IPAQ-S and OTs.
The questionnaires included in this review showed differences in several characteristics: time period to recall, PA domains, uncounted time calculations, cutoff points to establish intensity categories, formatting and administration, and analyses. The questionnaires asked participants to recall their PA during the last year, the last month, or during the last week (5–7 days). Only one study asked for the previous day. Questionnaires asking to recall the previous week PA correlated better with OT results than those that asked to recall PA during the last year. The correlation values for TPA estimates were r = .37–.54 and r = .2–.3, respectively, whereas those for MVPA were r = .08–.5 and r = .17–.33. Regarding PA domains, most studies included occupation, transportation, leisure time activities, and sleep. However, questionnaires such as the Occupational Sitting and Physical Activity Questionnaire (Jancey et al., 2014; Van Nassau et al., 2015; Chau et al., 2012) or the modified MONICA optional study on physical activity questionnaire (MOSPA-Q; Chau et al., 2012) focused on occupational activity, whereas the Transport and Physical Activity Questionnaire (TPAQ; Adams et al., 2014) and the Modified Residential Environment (RESIDE; Jones et al., 2015) questionnaire focused on transportation modes, trip duration, distance, or destination. Thus, certain PA domains may be more accurately measured by some questionnaires, whereas other questionnaires may provide a TPA estimate or aim to classify individuals according to their PALs. Questionnaire focus may contribute to the diversity of results, depending on the activities and PA domains assessed, as some activities might contribute in different amounts to PAEE.
Because the sum of the time spent on the reported activities was sometimes less than 24 hours, several authors assigned a metabolic equivalent (MET) value to the uncounted time. They assumed values from 1.2 to 2.0 METs (Besson et al., 2011; Bonn et al., 2015; Sasai et al., 2018). This assumption may properly adjust PA estimates or may lead to over- or underestimation of PA, depending on subject’s actual PA and the number of unreported hours. Additionally, several studies reported the categories of PA intensity and estimated the time that the subjects spent at each PAL. However, the criteria for such classification were heterogeneous. Six of the studies presented in Table 1 classified PA intensity according to the values proposed by Freedson et al., (1998), seven studies reported the values described by Troiano et al. (2008), two studies used other cut point values, and the authors of one study developed their own values (Bonn et al., 2015). The selection of cutoff points for PA intensity is an important aspect to take into account (Pedišić & Bauman, 2015). Using different cut points may result in differences in the time that the subject spent at each PA intensity category. This can be illustrated by a study conducted in postmenopausal women (Diniz et al., 2017). The authors compared the agreement between different cut point values to evaluate MVPA. The best agreement was between the Freedson et al. (1998) and Troiano et al. (2008) cut points. These cut points and those proposed by Sasaki et al. (2011) had good agreement, but using the cut points proposed by Copeland and Esliger (2009) resulted in poor agreement. Rodríguez-Muñoz et al. (2017) suggest that the use of counts would be useful to avoid the variability due to the use of cut points. In their study, the correlation between the PAQ-AD and accelerometer results was better when they analyzed accelerometer counts than when they used MVPA cut points. However, the 95% confidence intervals (CIs) were similar (r = .541, 95% CI [0.398, 0.661] and r = .500, 95% CI [0.333, 0.654]). Other studies comparing total counts and questionnaire estimates have shown different results (Tables 1 and 2).
Questionnaire formatting and administration may influence validity results. The Active Q questionnaire showed a moderate correlation for VPA when compared to accelerometers (r = .54). This was a web-based questionnaire that included 9–47 items, depending on the subject’s reported activities, because it allows registering a variety of activities using submenus for further details of the selected activity (Bonn et al., 2015). Adding details on PA domains such as transport and leisure time activities may have also contributed to increasing the correlation between the TPAQ and accelerometer measures (r = .72). However, the authors pointed out that this domain could have been overestimated (Adams et al., 2014).
In addition to the diversity of questionnaires, differences among the studies may be due to other factors such as the type of accelerometer, its placement site, and the characteristics of the study population. An accelerometer is a sensor that measures object acceleration along one, two, or three axes and allows estimation of the intensity and frequency of human movement. Uniaxial and triaxial accelerometers are commercially available, and their measures may vary depending on the number of axes and their placement (e.g., collar area, rear of the upper arm, forearm, front and rear sides of the ribcage, waist, thighs, shin, and top of the foot; Yang & Hsu, 2010). Triaxial are considered better than uniaxial accelerometers for assessing a variety of typical activities (Rowlands et al., 2004), and the waist or the hip is often selected for their placement. It is worth noting that waist- or hip-placed accelerometers may not register some upper body movements when the subject remains seated, such as weight lifting or riding a bicycle. This may underestimate PA in subjects who perform these kinds of activities frequently (Yang & Hsu, 2010).
In this review, we identified six studies using Actiheart (Bähler et al., 2013; Besson et al., 2011; Dahl-Petersen et al., 2013; Golubic et al., 2014; Moss & Czyz, 2016; Nicolaou et al., 2016), which combines an accelerometer with a heart rate monitor, and three using the SenseWear Armband (Molina et al., 2017; Sanda et al., 2017; Vancampfort et al., 2016), which combines an accelerometer with heat flux, skin temperature, and galvanic skin response. Their correlations with the estimates of PA dimensions from questionnaires were similar to the values found with other devices. This suggests that adding physiological parameters may provide further information or improve PA measurement by OT but may not influence the agreement with questionnaire estimates.
Regarding reliability, most of the reviewed studies showed good to excellent ICC, but poor ICC values were found for the Short Questionnaire to Assess Health enhancing PA (Nicolaou et al., 2016), the IPAQ-L for patients with chronic low back pain, and the GPAQ (Chu et al., 2015). Several factors influence reliability, such as test–retest period, conditions for questionnaire application, and intraindividual variability. The test–retest periods of the studies were between 3 days and 3 months. According to Terwee et al. (2010), a time interval of between 1 day and 3 months may be appropriate for measuring PA during the past week, a usual week, or the past year. The reliability of the GPAQ (Chu et al., 2015) was poor when the questionnaire was administered by interview, but self-administration improved its reliability. Vuillemin et al. (2000) argued that an interviewer’s presence may have an influence on individual responses.
PA is the most variable and second largest component of daily energy expenditure in most individuals (United Nations University and World Health Organization, 2004). Intraindividual PA variability is expected (Baranowski & de Moor, 2000) and may account for 30%–45% of the variance in healthy adults (Levin et al., 1999; Matthews et al., 2002). Intraindividual variation should therefore be accounted for in order to estimate PAEE, and strategies to adjust estimated energy requirements according to its variation should be considered.
In summary, we have reviewed studies reporting the validation of questionnaires for healthy adults and adults with special conditions, such as pregnancy and disease—common conditions observed in nutritional counseling. However, we should consider some limitations: (1) The full texts of three articles found in the screening were not available; (2) we searched in two databases and in the Google Scholar search engine and we could have missed publications in other databases, although the inclusion of Google Scholar allowed retrieving of open access articles; (3) several studies included additional statistical analyses to evaluate agreement and validity, but there was a wide variety of approaches; therefore, our analysis was based on the reported correlation coefficients only, which were the most commonly reported measures; and (4) because this review was focused on PA questionnaires, the number of questionnaires validating TEE against DLW could be underestimated; a review focused on TEE could provide more accurate information regarding its evaluation methods.
According to the results of this review, the short version of the JPHC-PAQ (Sasai et al., 2018), the STAR-Q (Csizmadi et al., 2014), and the RPAQ (Besson et al., 2011) may represent good options for estimating TEE in adults without disease. Considering the moderate correlation values, a certain level of inaccuracy should be expected. Furthermore, the duration of the assessment using DLW was from 11 to 15 days, and whether the values obtained are the representative of the period asked for in the questionnaire requires further investigation. Questionnaires evaluating TPA and its domains must be carefully selected. The PAQ and the IPAQ-S (Rodríguez-Muñoz et al., 2017) obtained a moderate agreement with OT in young adults and the Active-Q (Bonn et al., 2015) in elderly subjects. However, other characteristics, such as ethnic group and gender, should be considered, as evidenced by the study from Nicolaou et al. (2016). The purpose (e.g., research, individual clinical/nutritional assessment, epidemiological studies) and the availability of resources are also important aspects to consider for selecting a questionnaire. Furthermore, besides the agreement of questionnaire estimates and OT measures, the level of accuracy should be considered before their application to adjust for PAEE at the individual level.
In conclusion, despite having good reliability, most of the questionnaires included in this review have shown limited validity for estimating TPA in adults. Although some questionnaires may be useful for assessing TEE and some PA domains, the use of OTs, such as accelerometers, should be considered as a first option at the individual level, when possible. Further research is warranted on techniques to obtain more accurate PA and PAEE estimates.
Footnotes
Acknowledgements
The authors would like to thank the Consejo Nacional de Ciencia y Tecnología for the scholarships awarded to Angela Patricia Bacelis-Rivero and Anabel Vázquez-Rodríguez.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We received the support of the resources of the Facultad de Ciencias de la Cultura Física and the Facultad de Medicina y Ciencias Biomédicas, Universidad Autónoma de Chihuahua.
