Abstract
Mobile phone usage is typically measured via self-reporting. However, scholars have questioned the validity of self-reported data, which may lead to Type I or Type II errors. Using an online survey (n = 777), this study compared self-reported and log mobile phone usage data using a simplified version of the mobile data donation method. The results showed that people generally underreported their mobile phone usage in terms of time duration, the number of notifications, and apps used. Moreover, self-reported data may either have no additional effect on or overestimate the communication findings depending on the outcome variables. This challenges the Type II error explanation and suggests that the effect sizes of self-reported data might not be underestimated after all. Instead, past research examining mobile use and pertinent outcomes may have false-positive findings and Type I errors. Given the potential inaccuracies of self-reported data, future research on mobile media and communications should go beyond self-reported data to enhance the validity of findings.
Communication research has for many decades relied heavily on self-reported data to measure mobile use (Boase & Ling, 2013; Jones-Jang et al., 2020; Scharkow, 2019). Despite its prevalence, scholars have questioned the external validity of this method that can lead to measurement errors (Scharkow, 2019). Thanks to the extensive application of mobile technologies, the increasing availability of mobile log data has expanded the opportunities for data collection in mobile communication research as scholars can use built-in or customized mobile applications to collect mobile behavioral data (Boase & Humphreys, 2018). Yet, while studies have highlighted significant discrepancies between self-reported and log mobile data (e.g., Burnell et al., 2021; Lin et al., 2017; Parry et al., 2021), there is to date still no consensus on whether research participants tend to over- or underreport their mobile use behaviors (Boase & Ling, 2013; Burnell et al., 2021; Kobayashi & Boase, 2012; Lee et al., 2017; Scharkow, 2016), and whether these discrepancies are systematic, which can lead to potential Type I or Type II errors (Jones-Jang et al., 2020; Scharkow, 2016). Moreover, previous studies mostly relied on small sample sizes that focused on either iOS (e.g., Hodes & Thomas, 2021; Ohme et al., 2020) or Android mobile operators (e.g., Deng et al., 2019; Kobayashi & Boase, 2012) in western societies, which limit the generalizability of their findings.
To address these significant questions and limitations, we use a larger sample (n = 777) among participants who use a variety of mobile operating systems. Specifically, this study focuses on young adults in China, which is the largest group of mobile users worldwide (Statista, 2020). Accordingly, this study adopts a simplified version of the mobile data donation method (Ohme et al., 2020) to examine the discrepancies between self-reported and log mobile behaviors. In this simplified method, participants were guided to check their phones and donate daily-averaged time spent on phones, notifications, and apps used, rather than uploading screenshots. By applying this method, we will examine the discrepancies between self-reported and log measures, provide evidence on whether and to what extent the discrepancies are systematic, and how these two types of data collection can attenuate or accentuate the findings between mobile use and pertinent outcomes.
Concerns with self-reported measurements in mobile communication
The accessibility and portability of communication technologies allow mobile users to connect with others without physical and spatial limits. The prevalence of mobile technologies also causes great hindrance to the accurate estimation of mobile usage. Many researchers have reported that people check their phones habitually (Ellis et al., 2019) or with amotivation (Wu-Ouyang, 2022), which may prevent accurate recall of mobile usage. Moreover, people use multiple platforms (e.g., Facebook and Twitter), multimodal channels (e.g., mobile phone and face-to-face interactions), or multiple screens (e.g., phone and iPad) to communicate, creating huge difficulties in recalling specific mobile behaviors.
There are two possible errors when measuring self-reported mobile usage: recall bias (Ohme et al., 2020) and social desirability bias (Jones-Jang et al., 2020). Recall bias can be considered as random error. When scholars ask “Please estimate your phone activity yesterday” (Boase & Ling, 2013), the cognitive burden on people's memory could lead to recall bias. Previous studies showed that people have poor ability to recall their phone usage or even whether they had used their mobile devices (Vanden Abeele et al., 2019). This suggests that people's perception of technology usage may not be consistent with their actual usage (Boase & Ling, 2013; Ellis et al., 2019). The social desirability bias is a form of systematic error that demonstrates people's inclination to behave in ways that are congruent with their social beliefs. People may overreport actions that others consider desirable while underreporting activities that others evaluate as undesirable. Since mobile phones usage can be viewed as an indicator of social support, Boase and Ling (2013) posited that desire for social desirability may lead people to overreport their mobile phone usage. Conversely, people may also underreport their phone usage because they perceived phone usage as problematic and negative (Lin et al., 2015).
Scholars have also discussed whether the discrepancies between log and self-reported measures of mobile use are systematic, and, importantly, how this systematic error may affect the related statistical interpretations (e.g., Jones-Jang et al., 2020; Vrijheid et al., 2006). Simply put, if self-reported data contained systematic errors, the effect size could be overestimated, corresponding to Type I error which indicates the acceptance of the alternative hypothesis when it is false. In contrast, if self-reported data suffered mainly from random errors, the reported coefficients would be attenuated due to the smaller effect size. This may increase the possibility of a null relationship and Type II error (i.e., the rejection of the alternative hypothesis when it is true). We will address this crucial question by comparing self-reported and log measures.
The discrepancies between subjective and objective measures of phone usage
Several studies have reported high variation between log and self-reported measures of mobile use (e.g., Boase & Ling, 2013; Cohen & Lemish, 2003; Kobayashi & Boase, 2012; Ohme et al., 2020). For example, Cohen and Lemish (2003) compared self-reported with log phone calls among 211 Israeli people and found weak to moderate correlations (ranging from r = .19 to .47). Boase and Ling (2013) examined instant messaging and phone calls and found moderate to high correlations (n = 426, r = .35 to .74). Recent studies from Jones-Jang et al. (2020) and Ohme et al. (2020) also found correlations between reported and log time phone duration, pick-ups, and notifications, ranging from r = .16 to .40. Jones-Jang et al. (2020) further explained that self-reported and log data represented different aspects of human nature. The former is the subjective appraisal of their own media use, while the latter is objective media use. Such measurement errors may misrepresent statistical relationships between self-reported variables and other variables of interest.
Since most prior research mostly relied on relatively small samples and single mobile operators, we want to examine whether the discrepancies hold in a larger sample who use mobile operating systems. Moreover, most previous studies only highlighted the differences in phone usage time. There has been less focus on the comparisons between the number of notifications (Ellis et al., 2019; Loid et al., 2020) and phone apps (Ryding & Kuss, 2020). Therefore, we raise the following research hypothesis:
Mobile data donation and the tendency to under- or overreport
Some studies found that people generally underreport their phone or social media usage (Boase & Ling, 2013; Burnell et al., 2021; Kobayashi & Boase, 2012) while others found the opposite (Lee et al., 2017; Lin et al., 2017; Scharkow, 2016). Most of these studies recruited participants and used customized monitoring applications to unobtrusively measure their actual phone usage. For instance, Kobayashi and Boase (2012) asked 310 Android users to download an app designed specifically to track their mobile behaviors. Results showed that they overreported their calls, emails, and messages on phones compared with log data. Conversely, Lin et al. (2015) found that 79 young adults’ self-reported time was significantly lower than log data, which was also derived from an app. Similarly, Lee et al. (2017) found that 35 college students underreported their actual mobile usage time.
Requiring the installation of apps may yield threats to the internal validity. As Hodes and Thomas (2021) argued, such participants were aware that their mobile activities were being monitored and this perception could alter their mobile usage behaviors. For instance, they may intentionally increase or decrease their phone use time in order to match perceived researchers’ expectations. Moreover, Boase and Humphreys (2018) argued that participants were not often fully aware of the data collection process and may turn off their devices when they are aware, leading to an inaccurate (mostly decreased) log record of use.
More recently, Ohme et al. (2020) proposed a less intrusive methodological approach known as mobile data donation. Rather than ask users to install an app, the researcher asked participants to provide screenshots from their built-in applications, such as the Screen Time function from the iOS system or Android's digital well-being function. These applications contain useful data such as previous 7-day phone use time, pick-up times, notifications, and most used applications. By using this method, social desirability bias and recall bias can be reduced.
Ellis et al. (2019) compared the self-reported and log data derived from Apple's Screen Time application (n = 238) and found that people underreported their mobile behaviors. The effect size between self-reported and log measures was also small to medium (r = .13 to .40), indicating the poor performance of self-reported data in predicting log usage. Ohme et al. (2020) also found a weak to moderate convergent validity (r = .30 to .40) between log and self-reported measures regarding notifications, screen time, and pick-ups on iOS users (n = 65). Similarly, Jones-Jang et al. (2020) found that people tended to underreport their actual phone use by 13% and pick-up times by 24% through the Screen Time function (n = 294).
Thus, when using the data donation approach, it appears that peoples’ tendency to underreport is obvious. However, due to the privacy concern and low compliance rate (around 12%) of uploading screenshots of data donation, Ohme et al. (2020) suggested researchers should simplify the donation process to increase users’ agency. Moreover, according to Boase and Humphreys (2018), researchers should avoid collecting sensitive information unless absolutely necessary. Therefore, we respond to this suggestion by testing a simplified version that guides participants to check their phones and donate mobile data from their phones’ built-in system. As previous studies did not cover the phone usage of non-Apple users, it is necessary to explore the tendency in a larger sample with greater diversity of mobile users. Thus, we raise the following research question:
Is the discrepancy between log and self-reported mobile usage systematic?
If there are inconsistencies between self-reported and log data, the next important step is to consider whether the discrepancies are random or systematic. Or, more specifically, will the systematic measurement error attenuate or accentuate the effect sizes among other associated variables? Or, in other words, will communication findings be overestimated or underestimated? Will the statistical findings contain Type I/II errors? An accentuation effect indicates that self-reported data may inflate or overestimate the effect size when associated with other variables and may contain Type I error. An attenuation effect would suggest that discrepancies are random, which may lead to underestimated effect size and Type II error.
Findings from previous studies tend to support the Type II error explanation and the attenuation effect of self-reported data on the relationship between media use and its outcomes (e.g., Bartels, 1993; Jones-Jang et al., 2020; Vrijheid et al., 2006). Vrijheid et al. (2006) conducted a sensitivity analysis on mobile telephones and found that compared with systematic errors, random errors have a greater impact on outcome variables, indicating general Type II error. Kahn and Ratan (2014) also showed that self-reported measures of video games reduced the effect size of the results and create noise in the data. Recently, Jones-Jang et al. (2020) compared the correlation results of self-reported and log smartphone data on psychosocial outcomes and found instances of Type II error. This led to their conclusion that it was “good news” for communication research.
Only a handful of studies found evidence against the attenuation argument. Kobayashi and Boase (2012) found that self-reported data overestimated the associations between phone use and civic engagement with the increased significance level, thus indicating the accentuation effect and increased possibility of Type I error. Similarly, Scharkow (2016) discovered that the effect sizes for the self-reported internet usage were inflated compared with the log measures. A recent study from Burnell et al. (2021) suggested that the correlations between social media and well-being became insignificant once log data was used, suggesting possible Type I error.
Since the extant findings cannot produce a unified answer to the question of possible systematic error in self-reported phone usage (see also the review from Parry et al. [2021]), and most of them did not specifically study mobile phone usage duration (only Kobayashi and Boase [2012] and Jones-Jang et al. [2020] did so), it is crucial to address this question and clarify whether the discrepancy is systematic or random. Therefore, we compare the effect size of self-reported/log phone usage on pertinent outcomes, and put forward the research question:
Specifically, we explored whether and to what extent the associations between log measures of mobile use and well-being are different from self-reported measures of mobile use and well-being. Among the various consequences of mobile use, the effects on well-being (i.e., depression, loneliness, and negative outcomes) has received much research attention (e.g., Elhai et al., 2016; Hodes & Thomas, 2021; Kwon et al., 2013). Loneliness refers to a subjective emotional status occurring when there is a discrepancy between expected and actual forms of social connectedness (Radloff, 1977). Depression refers to a range of psychological problems characterized by a loss of interest in ordinary experiences (Hughes et al., 2004). Negative outcomes depict the mental distraction and physical problems resulting from one's mobile usage (Kwon et al., 2013). With mobile phones increasingly penetrating everyday life, the debate around the impacts of mobile usage and well-being has not produced a consistent answer. Some scholars have argued that mobile phones engender social interaction and self-disclosure, resulting in higher well-being (Chan & Li, 2020), while others have argued that mobile communications displace face-to-face interactions, which will trigger psychological problems and negative outcomes (Hodes & Thomas, 2021; Loid et al., 2020). Since the major concern of this study is the differences between self-report and log data and their impact on negative well-being, rather than the directionality, we pose the specific research question:
If the pattern of misreporting behaviors is non-random, what factors can explain these systematic measurement errors? Some studies have suggested that users’ characteristics might attribute to these measurement errors. Kobayashi and Boase (2012) found that demographic variables and log measures did not predict underreporting while Boase and Ling (2013) suggested that people who pay fewer phone bills generally underreport. A recent study showed that gender and conscientiousness have certain effects on people's underreporting of their social media usage (Burnell et al., 2021). Other studies have suggested that people's media usage might influence measurement errors. Vanden Abeele et al. (2013) and Scharkow (2016) found that light mobile phone users tended to overreport and heavy users are likely to underreport their actual phone usage, while Deng et al. (2019) reported the opposite. A recent meta-analysis from Parry et al. (2021) indicated that existing research is insufficient to conclude that the measurement discrepancy is systematic and they call for more investigations. Given the inconclusive findings, we explore whether demographics and mobile behaviors affect the tendency to over- or underreport. Thus:
Method
An online survey was distributed to respondents who were recruited by a professional company, Sojump, which provides the largest online panel in China with more than 260 million participants. Scholars typically use it as a data-sampling pool for Chinese participants (e.g., Chan & Li, 2020). Since young adults account for the largest group of mobile users in China as well as globally (Statista, 2020), quota sampling was applied so that the sample would represent people aged between 18 and 35 with an undergraduate or postgraduate degree in China.
The questionnaires were fielded to 1821 respondents in 2021, with 777 passing the attention check and willing to donate their mobile data. Our sample size was larger than that of previous studies (e.g., Elhai et al., 2016; Ellis et al., 2019). The average time for participants to answer the survey was around 12 min. Males constituted 51.2% of participants. Their age ranged from 18 to 35 (M = 26.03, SD = 4.66). Most of them have a monthly expenditure of 2000∼3000 Chinese yuan (M = 3.38, SD = 1.34). All participants have higher educational degrees.
Measures
Log and self-report mobile data (time, notifications, apps)
Self-reported data was obtained through participants’ survey answers. We asked them to estimate: “In the recent 7 days, how many hours did you spend on your phone everyday?”, “How many daily notifications do you receive?”, and “How many apps do you have on your phone?” In terms of the mobile log data, we adopted a simplified version of the data donation method based on Ohme et al. (2020) but it has fewer requirements to collect log data. Rather than ask respondents to upload their screenshots, people were guided to check their phones and provide daily-averaged time usage and notifications from the Screen Time application on the iOS system; and digital balance or digital well-being and parental controls on most Android built-in systems. 1 Then, they were asked to check the number of apps used on their phones. 2 If they could not find these functions, they could skip these questions. 3 Table 1 summarizes the descriptive statistics.
Data comparison of the daily self-report measure and average daily log activity on using mobile phones.
Note. M = mean; SD = standard deviation; n = number.
Well-being
We measured loneliness based on the Revised UCLA Loneliness Scale (Hughes et al., 2004), which is a well-accepted measure of subjective feelings of loneliness (Cronbach's α = .76, M = 2.15, SD = 0.80). Depression was measured with the 8-item scale developed by Radloff (1977) (Cronbach's α = .85, M = 2.19, SD = 0.68). Negative outcomes were measured using the 5-item measure revised from Kwon et al. (2013). A sample item includes: “Feeling wrist, neck pain, or having sleep disturbance because of using too much phone.” (Cronbach's α = .71, M = 2.77, SD = .78).
Demographics and controls
Five questions were developed to obtain background information on sex, age, income, education status, and the screen size of their mobile phones (1 = 4 inches and below to 4 = above 6 inches).
Data analysis
The analysis comprised three steps. To examine the differences between log and reported data, paired t-test and Wilcoxon Signed-Rank tests were first conducted. Secondly, correlation and regression tests were conducted to examine how the two types of data collection were related to the outcome variables (i.e., well-being). Third, logistic regression analysis was employed to explore factors influencing the differences between log and self-report using bootstrapping procedures.
Differences and relationships between log and reported measures
The Kolmogorov–Smirnov test and Q-Q plots were used to conduct normality tests. Scatterplots can be seen in Appendix A. Identified outliners were excluded in our analysis (i.e., daily phone usage more than 22 h, notifications more than 500 times). To address H1, we compared log data with self-report measures using paired-sample t-tests on time and app. The results indicated self-report time (M = 6.63, SD = 2.54) was significantly lower than log time (M = 7.29, SD = 3.01), t (731) = −6.71, p < .001. Reported app numbers (M = 46.84, SD = 27.30) were significantly lower than log data (M = 54.67, SD = 33.42, t (776) = −10.35, p < .001). Moreover, self-reported data was correlated with log data (r = .57, p < .001), while estimated number of apps was also related to log app (r = .78, p < .001). Though these correlations were relatively strong, considering the common criteria for evaluating standardized instruments requires the size of correlations to be above .80 (Cicchetti, 1994). Thus, the relationships were not sufficiently high to assume that they were measuring the same concept, which is the same conclusion as Boase and Ling (2013, r = .35 to.74). As the distribution for notifications was not normal, we used Wilcoxon Signed-Rank Test, following Hodes and Thomas (2021). The results showed that people significantly underreported their actual phone notifications, Z (699) = −8.31, p < .001. These tests support H1 and provided initial evidence for differences between self-reported and log data.
To answer RQ1 and understand how self-report data differs from log data, we directly compared the mean and median. As shown in Table 1, the mean of self-reported time was 6.63, while that of log data is 10% higher than self-reported data (M = 7.29). Meanwhile, the median of self-reported time was 6 h, while that of log data is 6.65. The results indicated that people generally underreported their mobile usage time. The measurement of notifications and apps followed similar patterns. The mean and median of self-report apps (M = 46.84, median = 40) were both lower than log applications (M = 54.67, median = 48). Meanwhile, people underreported their notifications 12% less than log measures (M = 76.16). Overall, the findings suggested that people tended to underreport.
Testing relationships with well-being
In response to RQ2, we conducted partial correlation analyses with control variables. Table 2 indicated that self-reported time was significantly correlated with depression (r = .08, p < .05), loneliness (r = .08, p < .05), and negative outcomes (r = .13, p < .01). However, the effects became insignificant when log data was employed (depression: r = .02, p = .61; loneliness: r = .03, p = .38; and negative outcomes: r = .06, p = .11). We then conducted two-tail z-tests (Cohen, 2013) to examine whether there are statistical differences between log and reported measures on their correlations with outcomes variables. The results indicated that the association between self-reported time and negative outcomes was significantly larger than those between log time and negative outcomes (z = 1.99, p < .05), while the relationships between self-reported time and other psychological outcomes were insignificantly larger than those between log time and psychological outcomes (depression: z = 1.70, p = .09; loneliness: z = 1.41, p = .16). These results suggested the potential tendency for overestimation of the effect size on self-reported data linked with negative outcomes, but did not give enough support for other wellbeing variables. 4
Partial correlation matrix of self-reported/log time with psychological outcomes.
Note. n = 723, * p < .05, ** p < .01, *** p < .001. Control variables are sex, income, age, and education.
Next, we ran hierarchical linear regression analyses to explain these well-being outcomes (RQ3). Table 3 showed that self-reported time was associated with loneliness (β = .09, p < .05), depression (β = .10, p < .05), and negative outcomes (β = .14, p < .01), while log time did not explain any of the outcomes. 5 To test whether the standardized beta weights of self-reported and log time were statistically different, we transformed these variables into z scores, calculated their 95% confidence intervals (CIs) using 5,000 bootstrapped samples (see in Table 3). According to Cumming (2009), if one side of CIs overlaps by less than 50%, their beta would be considered as statistically different from each other. After calculation, 6 we found the coefficients of self-reported time on negative outcomes can be considered as statistically larger than log time, while the self-reported and log time were not statistically different when they were linked with loneliness and depression.
Regression models of well-being.
Note. n = 728. Beta are standardized coefficients with the final entry results. Each data point was transformed to z-score and the 95% confidence intervals (CIs) using 5,000 bootstrapped samples were calculated.
* p < .05, ** p < .01, *** p < .001.
We also calculated their effect size according to Cohen (2013), and the results showed that only if the dependent variable is negative outcomes, the effect sizes of self-reported time (Cohen's f 2 = .02) were bigger than log time (Cohen's f 2 = .00). The rest of all effect size were smaller than .02, which suggested no effects.
In sum, while the mean comparison, correlation, and regression coefficients indicated the relationship between depression/loneliness/negative outcomes and log measures were weaker than those between self-reported measures and negative well-being, the z-tests, CIs estimations, and effect size comparisons showed the differences were only significant when the dependent variable is negative outcomes. These findings need to be interpreted with caution. We conclude that there is either no additional effect or overestimated effect of self-reported data depending on the outcome variables. Specifically, when self-reported and log time were linked with loneliness and depression, there were no statistical differences; when the dependent variable was negative outcomes, the coefficients of self-reported time can be considered as statistically larger than the log time, indicating the potential accentuation effect.
Multivariate logistic regression analysis
To understand the factors explaining people's over- or underreporting behaviors (RQ4), we created two series of dummy variables. 7 One indicated the tendency to overreport when the self-report measure is greater than the log measure (overreport), another indicated the opposite (underreport). Then several multivariate logistic regression analyses were conducted and showed in Table 4. All the models were significant, except for the underreporting on apps.
In terms of time measures, the first blocks containing demographics were both insignificant. When adding mobile usage variables, these models became significant for both tendencies to overreport (χ2 = 68.13, p < .001) and underreport (χ2 = 60.11, p < .001), increased predicted cases to 69.7% and 66.9%, and Nagelkerke R2 to 13 and 11 respectively. The Wald criterion demonstrated that log time was associated with the tendency to overreport (B = − .26, p <. 001) and underreport (B = .24, p < .001). Moreover, being a female (B = .40, p < .05) or younger (B = − .06, p < .01) was associated with the tendency to overreport. As for the notification measures, similarly, the log notification was linked with tendency to overreport (B = − . 01, p < .01) and underreport (B = .01, p < .001). Education was linked with a tendency to overreport (B = − .46, p < .05). For the misreport of app measures, the number of apps used is associated with the tendency to overreport (B = − .007, p < .05) and underreport (B = .02, p < .001), while being a female is positively associated with a tendency to underreport (B = .40, p < .001). 8
To further confirm whether misreport tendencies differ from levels of log data (see Appendix C), we conducted the post-hoc analysis, and created a series of dummy variables among three measures (1 = heavy users, the dividing point is median). Several Chi-square tests were performed to determine whether the tendency to misreport was related to heavy/light mobile usage levels. Results confirmed the differences and indicated that light users tend to overreport, while heavy users are likely to underreport (Time: χ2 = (1, n = 732) = 30.52, p < .001; Notifications: χ2 = (1, n = 709) = 12.23, p < .001; Apps: χ2 = (1, n = 777) = 36.70, p < .001). For example, heavy users (Time) were 2.37 times more likely to underreport than light users (OR = 2.37, 95% CI: 1.74, 3.23), while light users were 2.17 times more likely to overreport than heavy users (OR = 2.17, 95% CI: 1.58, 2.98).
In sum, our analysis uncovered some demographic traits and mobile usage variables to people's tendency to misreport. The results generally indicated that light users tend to overreport, while heavy users are more likely to underreport regarding their time duration, notifications, and app numbers.
Discussion
Compared with previous research studies which were limited to small samples and a single mobile operator, our study used a large diverse sample varying in age, income, education status, and operating systems. We applied the simplified data donation method to collect log data from phone users and our analyses confirmed the substantial discrepancies between log and reported data, which is in accordance with the previous research (e.g., Ellis et al., 2019; Loid et al., 2020; Ryding & Kuss, 2020). Importantly, we found that people's overall tendency is to underreport, which is different from some studies (Deng et al., 2019; Lee et al., 2017; Lin et al., 2017), but is in line with the recent ones (Burnell et al., 2021; Ellis et al., 2019; Jones-Jang et al., 2020; Ohme et al., 2020).
The findings echo the argument for social desirability bias in self-reported mobile usage and support the assumption that people may consider high levels of phone usage as negative, and thereby underreport. Empirical studies have suggested that excessive phone usage is linked with addiction symptoms such as tolerance (Lin et al., 2015). Due to these negative outcomes brought by phone usage, people may purposefully underreport their usage duration to conform to social expectations. Alternatively, this result can also be explained by prospect theory in the field of cognitive psychology (Tversky & Kahneman, 1981). When faced with decisions that include possible advantages such as acquiring favor from social groups, people may tend to be risk-averse (i.e., underreport). Furthermore, as mobile use has become habitual, people may be amotivated to use their phones without realizing the passing of time (Wu-Ouyang, 2022), which can result in underreporting behaviors.
Most notably, this study demonstrated non-random discrepancies between self-reported and log mobile data and rejected the Type II error explanation and the attenuation effect of self-reported data on the relationship between media use and its outcomes. Instead, this study addressed that self-reported data might have no additional or overestimated effects on the communication findings depending on the outcome variables. Specifically, our results showed significant differences between log and self-reported data in explaining negative outcomes, but not for loneliness and depression. If the latter occurs in most cases, it brings positive news to academia because it shows that even though self-reported data is underreported, no statistical differences in effects were found. The former finding might be more noteworthy because it is in line with Kobayashi and Boase (2012) and Scharkow (2016), supporting the possibility of systematic error and lending support to the notion that past research examining mobile use and pertinent outcomes could contain Type I error as the effect sizes of self-reported data could be overestimated and thus lead to false-positive findings.
In general, these findings contradict previous findings (Jones-Jang et al., 2020). Rather than being “good news,” our findings suggest that communication findings might not be underestimated after all. Communication researchers should be cautious when they interpret findings based on self-reported data. It should be noted that Jones-Jang et al. (2020) only conducted bivariate correlations without controls. Moreover, when using z-tests to assess the paired correlation for differences, Jones-Jang et al. (2020) applied one-tail tests rather than two-tail tests (e.g., z = 1.88, p < .05), and the replicated two-tail tests indicated no statistical differences in most cases. Therefore, it might be premature to claim that the self-reported measures produced smaller effect sizes than log data. Instead, we argue the possibility of either no additional effect or overestimated effect of self-reported data depending on the outcome variables, leading to no error or Type I error of mobile effects.
The third research question further explored the factors explaining the non-random discrepancies. Importantly, we found mobile media use can explain these discrepancies to some extent since all log measures of phone use were associated with the tendency to misreport. Chi-square tests further confirmed the differences between users’ media profiles, and found that heavy users were likely to underreport, while light users tended to overreport. This finding is in line with some research (Scharkow, 2016; Vanden Abeele et al., 2013), but different from Deng et al. (2019) where heavy users overreported their phone usage and light users underreported. One plausible reason might be the social desirability bias. Since mobile usage can be viewed as an indicator of social support (Chan & Li, 2020), it is possible that light users may overreport their phone usage to meet their social needs, while heavy users may underreport their phone usage to decrease social attention. This also lends support to the findings of Lichtenstein et al. (1978) where people may overreport their frequency of rare activities and underreport the likely events.
Some demographics can also contribute to misreporting. According to Table 4, older participants were more likely to underreport their mobile time. Since adults may have higher social desirability with the increase of age (Vigil-Colet et al., 2013), they may deliberately underreport their phone usage time. Moreover, being a female was positively linked with overreporting on time and underreporting on app numbers, which are inconsistent with previous studies (Boase & Ling, 2013; Jones-Jang et al., 2020; Vanden Abeele et al., 2013). These may indicate the gender differences in social desirability. A tentative explanation might be that as being a female was positively linked with social desirability (Vigil-Colet et al., 2013), females may perceive having limited media choices but spent more time on them as socially worthwhile and desirable since it may increase more relationship closeness to their strong ties. According to Table 4, higher education was associated with underreporting of notifications. In addition to the social desirability explanation, another possible reason might be that higher educated people have more informative notifications such as news and emails, which leads to information overload and a tendency to underreport. While these demographic variables did not explain a large amount of variance in the model, which is similar to Boase and Ling (2013), it is still possible that the estimates are reliable estimates of the ceteris paribus effects of each independent variable on loneliness/depression/negative outcomes (Wooldridge, 2015, p. 81). Future studies may continue to find more covariates (e.g., social desirability [Vigil-Colet et al., 2013], self-disclosure [Chan & Li, 2020]), or learn from physiological psychology to mitigate the influence of the systematic error.
Multivariate logistic regression models on the tendency to over- and underreport on mobile behaviors.
Note. Beta are unstandardized coefficients with the final entry results.
* p < .05, ** p < .01, *** p < .001.
Overall, this study found that people's tendency to underreport and demonstrated the extent this discrepancy might influence the pertinent findings. Since this study supported the possibility of either no additional or systematic error depending on the outcome variables, we raise critical doubts about the common Type II error explanation and indicate that before we draw conclusions about mobile use and its effects, we should try to recognize and mitigate the non-random errors.
Limitations and future directions
The study has several limitations. First, since self-reported and log data were collected from the same questionnaire, it was possible that participants checked their log time early and answered the same way for log and self-reported data. However, since considerable differences were found between these two types of data, it is unlikely that many participants did this. Future studies may conduct experiments or longitudinal studies to robust the findings. The second concern comes from the different mobile operating systems that we included. Since not all operating systems have the same functions provided such as notifications, relying on this type of data reported from their inserted applications led to missing values. For example, Huawei, which is a major mobile phone vendor in China, does not provide the notifications data, while Oppo, another vendor, does not provide pick-up times. Although the missing value for notifications in this paper (around 5%) is within the acceptable range (Dong & Peng, 2013), it lacked comparisons between different phone companies. Future studies should acquire data on phone models and make comparisons. Moreover, since we did not request the pick-ups data, we did not compare how this type of data differ from self-reported data, and how it influences outcomes factors. Jones-Jang et al. (2020) found marginal differences in pick-up times between log and reported data, and pick-up times also have associations with problematic use. Future studies can explore more available functions.
Another methodological concern is the application of the data donation method. Following the suggestions of Ohme et al. (2020), we provided a simplified method of data donation. Unlike Ohme et al. (2020), we did not ask participants to upload their screenshots. Instead, we gave them instructions on how to check screen time for the top five mobile operators. Two reasons account for these changes. First, as Ohme et al. (2020) pointed out, their final mobile donators show higher phone savviness and lower privacy concerns compared with non-donators. Therefore, the original donation method may lead to sample bias. Secondly, the compliance rate for participants to donate screenshots was lower than 12%, which will increase the difficulties for researchers to conduct research with a representative sample. Especially, our study participants varied in different mobile systems, the requirement of uploading screenshots from different devices would be another practical concern for researchers to analyze. Since researchers should only acquire sensitive information if absolutely necessary (Boase & Humphreys, 2018), we adopted a more cost-effective version of the data donation method.
Admittedly, as the aided recall measures, the self-reported log data resides between self-reports and log data. People may randomly misreport their phone use because they cannot find the Screen Time function. Though it is beneficial to make collaborations with telecommunication companies or let users use prepared mobile phones, considering the time-consuming and potential bias to internal validity, we needed to make several trade-offs. Moreover, we believe that this aided method is better than the direct recall of the averaged daily mobile usage. According to Tourangeau et al. (2000), people's memory continues to elapse after the occurrence of the event. The aided recall measures, as a technique for providing cues to enhance memory, can help most participants to retrieve and report from their phone applications. In particular, Tourangeau et al. (2000) argued that aided strategy is most appropriate when people have a tendency to underreport. Future studies should make comparisons among various log mobile data (e.g., third-party, screenshot, self-reported, and experiment) to validate this aided measure. Alternatively, studies can also use anchoring statements (Scharkow, 2016) to make a compensation for this reported log data. For instance, survey instructions could contain information regarding the average log time, such as “On average, people use their mobile phone for more than 6 h per day.” Moreover, they can also notify participants that their mobile usage might be underestimated and therefore to debias (Lichtenstein et al., 1978).
Last but not least, the different findings as compared with the previous studies could be attributed to the fact that this study adopted a diverse sample that differed in age, country, and mobile operators. Specifically, the current study comprised more mature young adults (M = 27.03) than most previous studies (see Ryding and Kuss’s [2020] review). Since age negatively influenced people's overreporting tendency on time, people of higher age may deliberately underreport their phone usage time. Another possible explanation comes from the different mobile operators. Since this study recruited a sample that varies in mobile operating systems, the results might be different from the previous studies utilizing a single operator. For example, Götz et al. (2017) found that iPhone users were more likely to be wealthier and more extraverted, while Android users tended to be more open to experience. These personality factors may affect their reporting tendencies. Furthermore, cultural factors may influence the differences. For example, the social desirability bias or guanxi (Huang, 2001) might be stronger in China, impacting participants’ tendency to underreport. The internet censorship (Tai & Fu, 2020) may also impact their misreporting behaviors. Future studies might compare countries and explore the potential cultural differences.
Despite the limitations and concerns, this paper adds several contributions to mobile communication research. By examining people's tendency to underreport and the extent to which the discrepancies between self-reported and log mobile data might influence the pertinent findings, this paper doubts the common Type II error explanation and found that self-reported mobile data contain either no additional or Type I error depending on the outcome variables. The latter Type I error further suggests that false-positive results may occur when using self-reported data as the explanatory variable, which indicates that some previous studies using self-reported mobile data may accentuate the correlational or predictive findings compared with log data.
Furthermore, the accentuation effect of self-reported data has crucial implications for knowledge creation and self-diagnosis. If the pathways from mobile usage to addiction are insignificant, but self-reported data falsely show significance, this result will lead to wrong conclusions on the impact of mobile use on well-being, leading to misjudgment by individuals and institutions. Given the potential inaccuracies of self-reported data, future communication research should seriously consider the use of log data as a more robust method to enhance both internality and externality of subsequent findings, taking us one step closer to understanding the impact of mobile media.
Supplemental Material
sj-docx-1-mmc-10.1177_20501579221137162 - Supplemental material for Overestimating or underestimating communication findings? Comparing self-reported with log mobile data by data donation method
Supplemental material, sj-docx-1-mmc-10.1177_20501579221137162 for Overestimating or underestimating communication findings? Comparing self-reported with log mobile data by data donation method by Biying Wu-Ouyang and Michael Chan in Mobile Media & Communication
Footnotes
Acknowledgements
We would like to thank the editors and three anonymous reviewers for their beneficial help in developing this manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the School of Journalism and Communication, Chinese University of Hong Kong.
Supplemental material
Supplemental material for this article is available online.
Notes
Author biographies
Biying Wu-Ouyang is a PhD student from the School of Journalism and Communication, Chinese University of Hong Kong. Her research focuses on emerging media technologies including mobile media, social media, and VR/AR, media psychology, interpersonal, and intercultural communication. ORCID: http://orcid.org/0000-0003-4114-6367. Website:
.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
