Abstract
This article explicates the concept of user engagement by synthesizing a disparate body of scholarship, and suggests a measurement and a structural model for empirically capturing the meaning and process of user engagement, specifically in the context of interactive media. A second-order confirmatory factor analysis of data from two experiments (N = 263) shows that four attributes—physical interaction, interface assessment, absorption, and digital outreach—together constitute a valid and reliable operationalization of the concept of user engagement. A structural equation model reveals that greater amount of physical interaction with the interface and a more positive assessment of the interface predict cognitive absorption with the content, which in turn is associated with greater behavioral intention to manage and socially distribute the content. In addition, predictive validity tests show that the four subscales are predictors of attitudinal and learning outcomes.
Introduction
With the rise in interactive media, the concept of user engagement has become a veritable buzzword, with those in the industry choosing “engagement-based” media over “impression-based media” (Morrissey, 2009; Stanley, 2013). Although mass communication researchers have long wondered how to engage users with media content and what effects such exposure will have on media audiences’ attitudes and behavior (e.g., Chaffee & Schleuder, 1986; Price & Zaller, 1993; Wang, 2006), scholars today have to contend with several different avenues to capture user engagement as a result of unique audience interactions with the numerous affordances offered by interactive media. Napoli (2011) argues that the emergence of new media technologies has urged the industry to look beyond simple exposure-based metrics (e.g., time spent with a medium), and instead make “efforts to dig deeper into the nature of media consumption” (p. 95).
For instance, marketing and advertising professionals have used the concept of user engagement to illustrate how electronic word-of-mouth (e-WoM) campaigns have the potential to spawn widespread discussions and debates in social media settings such as Facebook and Twitter (Falls, 2013; Hanna, Rohm, & Crittenden, 2011). Researchers have examined the extent to which individuals feel absorbed, transported, and immersed while playing video games (e.g., Csikszentmihalyi, 1990). In the area of human-computer interaction (HCI), the concept of user engagement includes both the perceived quality of the system (e.g., aesthetics and novelty) and the psychological and behavioral outcomes of interacting with the system (e.g., focused attention and endurability; O’Brien & Toms, 2008).
The examples above signify the conceptual complexity as well as confusion that surround this emergent variable, thus underscoring the need for an explication (Chaffee, 1991) that identifies exactly what constitutes user engagement, uncovers its different facets, and ultimately helps us operationalize it in meaningful and useful ways. The relevance of studying user engagement at this juncture also marks a departure from traditional media outcomes such as message reception and message retention (e.g., Geiger & Reeves, 1993; Lang, Zhou, Schwartz, Bolls, & Potter, 2000), which tend to be passive. As a contrast, in the current media environment, user engagement has become an essential outcome of active user-system interaction afforded by unique technological features such as interactivity. Such interaction-based engagement is said to critically influence the perception of content offered by the medium (e.g., Bardzell, Bardzell, Pace, & Karnell, 2008; Sundar, 2007). Thus, the current focus with user engagement is toward more active outcomes where users can make their presence felt via various activities, including spreading and sharing media content, as has become common in online social media.
The goal of this article is to explicate user engagement by synthesizing a disparate body of scholarship pertaining to interactive media, to create a common theoretical understanding of the concept, and to empirically test a measurement model of the concept. We propose four critical components of user engagement that lie on a continuum in the context of interactive media, based on the process-based view of user engagement proposed by previous scholars (Napoli, 2011; O’Brien & Toms, 2008; Oh, Bellur, & Sundar, 2010). Other scholars have also argued similarly for the need to go beyond thinking of engagement as a “single event,” and instead conceptualize it as “interconnected gears” comprising cognitive, affective, and behavioral components (Interactive Advertising Bureau [IAB], 2014, p. 6). We validate the four-component model through a confirmatory factor analysis (CFA) and a structural equation model, using data from two experiments.
Explicating User Engagement
The term engagement, as defined in dictionaries, seems to have two common factors: to enter into a contact and to occupy attention or involve effort for a long period of time (Peters, Castellano, & de Freitas, 2009). O’Brien and Toms (2008) view the concept of engagement as a process including both the point of engagement and the period of engagement. They suggest that users pay attention to the overall aesthetics of the system at the beginning stages of interaction, and thereafter sustain that attention by using interactive features (e.g., exploring different actions afforded by the system), by challenging their skills, and by experiencing novel interactions. Thus, user engagement includes both the first encounter with the message or the medium as well as a sustained period of involvement with media content that can be affected by the initial interaction with the message or medium.
Another way to decompose the meaning of user engagement is by distinguishing the psychological dimension from the behavioral dimension. Although previous literature has focused on engagement as a psychological experience, such as the degree to which a user is cognitively involved in a task at hand (e.g., Busselle & Bilandzic, 2008; Strange & Leung, 1999; Wang, 2006), users’ actual behavior, such as clicking hidden content and exploring interactive features, is also a meaningful way of defining user engagement. In fact, this is how the online advertising industry often defines engagement, for example, click-through rate, number of page views, and return rate (Lehmann, Lalmas, Yom-Tov, & Dupret, 2012).
Built on these previous studies, we propose a user engagement continuum that begins with users’ preliminary assessment of, and interaction with, interactive media interfaces, followed by deeper absorption with media content and behavioral outcomes. In the following sections, we will identify four attributes of user engagement and construct a conceptual model by building on previous research.
User Engagement as a Psychological Experience: Absorption and Interface Assessment
Previous definitions of user engagement with media commonly refer to a strong cognitive and emotional focus on media content that completely immerses users in a mediated experience (e.g., Busselle & Bilandzic, 2009; Klimmt & Vorderer, 2003; O’Brien & Toms, 2010; Slater & Rouner, 2002; Strange & Leung, 1999). Regardless of media involved, these definitions have maintained that user engagement is a phenomenon where viewers or readers are completely invested in the unfolding of the media content, often oblivious to the surrounding environment. For instance, narrative engagement or transportation refers to construction of mental models where all mental systems and capacities become focused on events occurring in the narrative (e.g., Busselle & Bilandzic, 2008; Green & Brock, 2000). In the advertising literature, consumers are said to be engaged when they feel strongly connected to a brand’s message, seeking emotional rewards such as stimulation, inspiration, and enjoyment (Calder, Malthouse, & Schaedel, 2009; Mollen & Wilson, 2010).
As a closely related term, absorption also refers to a state of deep involvement with media. In the HCI context, absorption has been conceptualized as the degree to which users experience temporal dissociation, focused immersion, heightened enjoyment, curiosity, and control over the computer interaction (Agarwal & Karahanna, 2000). Reviewing the overlapping concepts, Slater and Rouner (2002) suggest that engagement, absorption, and transportation refer to the same phenomenon—the degree to which a message recipient is cognitively and affectively involved in the vicarious experience of media interaction.
From our perspective, the prior conceptualizations of user engagement often fail to specify how such strong involvement with media can be initiated in the first place, because they focus solely on how users subjectively think and feel about media content. The novelty of user engagement with interactive media accounts for this first step, namely, in their ability to initiate and maintain active user interaction with the interface. With interactive media, users are likely to encounter the interface of the system before they evaluate the content. For instance, in the case of a website, users are likely to begin with processing some preliminary information about the interface, such as its visual features, aesthetic appeal, perceived usability, and so on, before they get absorbed (or drawn) into the content of the website (Lindgaard, Dudek, Sen, Sumegi, & Noonan, 2011; O’Brien & Toms, 2010; Sundar, Bellur, Oh, Xu, & Jia, 2014). If the users find the website interface easy to use and intriguing enough, they are more likely to get deeply involved in the mediated content of the interface (i.e., absorption; O’Brien & Toms, 2010; Sutcliffe, 2009). Thus, user engagement with interactive media should include users’ preliminary appraisal of interface quality, which is likely to shape their absorption in the content delivered by the interface.
In fact, literature exists to support the idea that user engagement begins with attraction, curiosity, and intrinsic interest toward the medium or interface. User engagement has been defined as an intrinsically motivated attraction to a media system (Jacques, Preece, & Carey, 1995) and as “a state of playfulness which includes attention, focus, curiosity, and intrinsic interest” on multimedia (Webster & Ho, 1997, p. 65). O’Brien and Toms (2010) suggest that aesthetics and novelty of the system first induce involvement and attention, which in turn, increase perceived usability and finally lead to endurability of system use. Bickmore, Consolvo, and Intille (2009) also conceptualize user engagement as “maintenance of user-adherence” to a desired interaction usage pattern evoked by a successful design (p. 4807).
In sum, previous literature suggests that a conceptual model of user engagement with interactive media should include both users’ attraction or interest toward the medium—which we term interface assessment—as well as subsequent absorption with content. Formally, we propose the following hypothesis for study:
User Engagement as a Behavioral Experience: Physical Interaction and Digital Outreach
Although previous literature has focused on engagement as a psychological experience, a few definitions try to broaden this focus and include the physical dimension as well. This is especially pertinent with newer forms of media where users are not only processing information internally (i.e., as mental states), but they are also controlling the flow and the content of incoming information by physically interacting with the medium. Thanks to the interactivity offered by modern interfaces, users are able to perform a number of actions, such as scrolling, swiping, flipping, sliding, or zooming-in/out an object, with a variety of input modalities and interaction techniques. Such physical interactions influence a variety of outcomes such as users’ attitudes and behavioral intentions (Brown, 2014; Sundar, Xu, Bellur, Oh, & Jia, 2011), as well as actual behaviors, as discussed below. In other words, an important facet of user engagement as a behavioral experience includes the many tangible ways in which users voluntarily interact with an interface: physical interaction.
There is ample evidence that the media industry is recognizing these changes. For instance, the IAB recognized cost-per-engagement as the new advertising pricing model (Morrissey, 2008; Stanley, 2013), instead of the cost-per-click (CPC) or cost-per-thousand models (CPM) that prevailed earlier. In a recent conceptualization of engagement, the IAB (2014) defines it as “a spectrum of consumer advertising activities and experiences—cognitive, emotional, and physical—that will have a positive impact on a brand” (p. 6). Performance-based ads, also called AdFrames, invoke users to interact with them in various ways, such as rating an ad, forwarding it to friends, and commenting on it. Recently, Google released a new advertising service called “Engagement Ad” (Cohen, 2014), whereby advertisers can easily personalize their content for target consumers using rich media and pay only if a user moused over the ad for more than two seconds. Creation of such engagement-based metrics that rely heavily on active user participation reiterates the need to explore this phenomenon of user engagement from a behavioral perspective.
A related form of user engagement in interactive media is the triaging and organizing of content for future use. When users are engaged with a website, they can bookmark the website, thereby creating collective clusters of bookmarks. Social bookmarking websites, such as Delicious or Reddit, enable users to take further actions, such as manage, categorize, and share the personal collection of links, which can be considered a further stage of user engagement than merely reading the website (Sharma, 2011).
Finally, the most important aspect of user engagement is social outreach that extends their experience with media into the offline realm. Studies in marketing and social media epitomize this behavioral dimension of user engagement with one word: “viral.” Virality of messages is based on users’ voluntary behavior of forwarding the brand message to other users (Dobele, Toleman, & Beverland, 2005). Scholars (Berger, 2013) have tried to deconstruct how content generation and sharing have both become highly “contagious,” urging the social media industry to focus on network-level outreach metrics such as cost-per-followers on Twitter (Cha, Haddadi, Benevenuto, & Gummadi, 2010) and cost-per-like on Facebook (Indvik, 2013). In this case, user engagement refers to the behavior of sharing and exchanging one’s experience with the product or service with other like-minded users, which contributes to making a message or product go “viral.” This social outreach behavior of users, contributing to the rapid and widespread distribution of positive or negative messages, is a critical form of user engagement in the current online media landscape.
Thus, we propose that user engagement as a behavioral experience includes both physical interaction with interface as well as taking further actions on the content such as managing and sharing it, which we term digital outreach:
Therefore, our new definition of user engagement is a form of user experience which includes both (1) a psychological state where the user appraises the quality of media and becomes absorbed in media content and (2) a behavioral experience in which the user physically interacts with the interface and also socially distributes and manages the content.
In the next section, we propose a new conceptual model of user engagement with four critical components—physical interaction, interface assessment, absorption, and digital outreach—and specify the relationships among the four factors of user engagement and their operational definitions.
Conceptual Model of User Engagement
Physical Interaction
The first step toward engagement is to involve all our physical actions in a specific task or interaction context. Operationally, physical interaction can be defined as the amount of observable activity of users with the interface. Eye-tracking software can be applied to directly measure this type of viewer engagement on a computer screen (Dreze & Hussherr, 2003). Usage of various multimedia features on a website (e.g., the number of times a user clicked on hyperlinks or tapped on a haptic interface) can also represent the amount of physical interaction with the system.
With the availability of data and screen capturing software (e.g., Silverback™; Camtasia™), such measurements of users’ physical interactions with interface have become easier and popular choices in academy as well as industry. In fact, click-through rates have been a popular measure of user engagement in web advertising (Dreze & Hussherr, 2003; Lohtia, Donthu, & Hershberger, 2003; Sokolik, Magee, & Ivory, 2014), but they do not show whether users actually paid attention to the content after the initial click. Given this, popular website metrics such as Google Analytics have used average reading time as an approximate measure of users’ attention to the content. Studies from Yahoo Labs have combined objective measures of user engagement such as cursor movement (Arapakis, Lalmas, & Valkanas, 2014), eye tracking (Arapakis, Lalmas, Cambazoglu, Marcos, & Jose, 2014), or the time between two visits from the same user (Dupret & Lalmas, 2013).
From the preceding discussion, it is clear that in order to capture deliberate user actions, we need to move beyond clicks and include other actions that convey the user’s attempts to engage with the interface and its contents. With this in mind, we operationalize physical interaction with a website as various mouse-based actions (click, slide, drag, and mouseover) that are deployed by the user to access distal information. Specifically, we capture the average number of clicks/slides/drags/mouseovers on hotspots that reveal hidden (or embedded) information upon users’ interaction, and the average time spent reading the information in each hotspot after they have performed the action to pull it up.
Interface Assessment
Interface assessment is defined as users’ initial evaluation of the interface. Theories of human information processing (Paivio, 1990) and memory systems (Baddeley, 1998) have illustrated how individuals make sense of incoming information by actively creating and manipulating new mental representations in working memory before making connections to pre-existing knowledge structures in long-term memory. Baddeley (2003) proposes the concept of an “episodic buffer” which acts as an intermediary step (fully available to conscious awareness) that helps in combining a variety of incoming signals. In human-website interaction, previous literature suggests that three criteria for evaluating the interface can dominate the user’s preliminary perceptual assessment of the interface: (1) natural mapping ability, (2) intuitiveness, and (3) ease of use (Blackler, Popovic, & Mahar, 2005; Norman, 1990; O’Brien & Toms, 2010; Sundar et al., 2014; Venkatesh & Davis, 2000).
Natural mapping ability of the interface refers to “the ability of a system to map its controls to changes in the mediated environment in a natural and predictable manner” (Steuer, 1992, p. 86). An ideal natural mapping strategy follows the natural action as closely as possible (Norman, 1990), such as flipping through images in a manner that is similar to flipping through a magazine (Sundar et al., 2014). Given that human perceptual and motor systems are optimized for real-life interaction (Biocca & Delaney, 1995), the natural mapping ability of the interface can function as a precursor for further engaging experiences such as a touch screen with gravity or inertia (e.g., iPhone’s gyroscope feature).
Intuitiveness of an interface refers to the extent to which it allows the user to unconsciously utilize stored experiential knowledge (Blackler et al., 2005) that can lead to effective interaction (Naumann & Hurtienne, 2010). When users perceive an interface as being intuitive, it is found to have desirable effects on user experience, such as facilitating efficient cooperation among users and increasing users’ enjoyment (Yoshida, Tijerino, Abe, & Kishino, 1995).
Natural mapping and intuitiveness are closely related to the perception of ease of use. A system that is easy to use requires little physical or mental effort. Studies have shown that perceived ease of use is a significant predictor of user attitudes and behavioral intentions toward interactive technologies (e.g., Lee, Fiore, & Kim, 2006; Venkatesh & Davis, 2000). Thus, interface assessment, as a preliminary stage of user engagement, can be operationalized as the extent to which the user perceives the interface as natural, intuitive, and easy to use, which would lead to further psychological involvement with media content.
Absorption
Whereas interface assessment and physical interaction are the starting points of information processing (wherein users may not be immersed in the content while activating their sensory mechanisms), absorption signals deeper involvement with the content. Previous studies have also measured this aspect of engagement in the form of narrative engagement, the degree to which a story evokes related experiences, elaboration about story content, and attention of readers measured via self-report items (Strange & Leung, 1999). Similarly, the extent to which the user perceives that his or her attention is focused on the interaction has also been measured as one of the components of user engagement (Webster & Ho, 1997). Thus, we operationalize this stage as being much further along in the engagement continuum, where the individual is consciously involved in an interaction, and more specifically with the content of the interaction, with almost complete attentional focus on the mediated environment.
In addition, we hypothesize that this stage is influenced by the two components that come before it—physical interaction and interface assessment. Previous studies found that the degree of absorption relies on how positively the user appraised the interface features (O’Brien & Toms, 2010; Sundar et al., 2014). Also, the amount of physical interaction has been found to be associated with users’ attitudes toward media content, although the use of actual behavioral metrics has been scarce. For instance, self-reported click-through frequency of Internet ads was correlated with users’ perception of the ads being attractive, eye-catching, and entertaining (Burns & Lutz, 2006). Furthermore, self-reported click-through intention was associated with personal relevance of advertising content (Cho, 1999).
Digital Outreach
We conceptualize digital outreach as a heightened phase of engagement which is marked by several behavioral (action-filled) indicators, for example, sharing content with other individuals in one’s personal and social networks, bookmarking the website for future use, and so on. Previous studies have suggested that individuals are likely to share stories in conversation (Peters, Kashima, & Clark, 2009) as well as in social media (Stieglitz & Dang-Xuan, 2013) when they are emotionally engaged with the content. Behavioral outcomes of engagement have been operationalized as users’ intent to share the content with or email it to friends, family members, and coworkers (Berger, 2011); information dissemination in social media such as retweeting (Stieglitz & Dang-Xuan, 2013); and bookmarking the content for future use or the intent to revisit it (Hu & Sundar, 2010). In this study, we define digital outreach as all possible forms of behavioral interaction with online content, including social transmission of the content, content management such as bookmarking, and repeated use of the content, which is influenced by the three previous components of user engagement.
In sum, the four components proposed thus far could fall along an engagement continuum, ranging from the starting point of engagement (physical interaction and interface assessment) to subsequent stages of engagement (absorption and digital outreach). Thus, we propose the following hypothesis:
Figure 1 summarizes our hypotheses.

Hypothesized relationships among the four aspects of user engagement.
Attitudinal and Cognitive Outcomes
Finally, it is important to distinguish user engagement from its consequences, such as evaluations (e.g., attitudes toward the interface) or cognitive outcomes (e.g., learning). Previous literature suggests that when users had feelings of absorption and presence in the browsing task by well-designed interactive features, their attitudes toward the website and even the content can be positively affected (Klein, 2003; Li, Daugherty, & Biocca, 2002; Xu & Sundar, 2012). Also, interface qualities such as ease of use, naturalness, and intuitiveness have been significant predictors of users’ attitudes toward the website (Teo, Oh, Liu, & Wei, 2003) as well as its content (Sundar et al., 2014). Studies have also found that learning outcomes can be enhanced when users are engaged with interactive media by actively controlling the pace of incoming information and interactive simulation (Evans & Gibbons, 2007; Salajan et al., 2009). Thus, we also examine whether the four components of user engagement that we have proposed successfully predict users’ attitudes and recall memory of the content. We do so by constructing a structural equation model that includes all four components of the user engagement continuum, along with key proposed outcomes, namely, attitudes toward the interface and its content, as well as recall memory for the information conveyed.
Method
We combined two data sets of undergraduate students interacting with a stimulus website in two lab experiments (N = 263). The two data sets were standardized first and then combined. In order to test the conceptual model of user engagement (Figure 1), we first tested the four-factor model of user engagement with a CFA. We then employed a structural equation model to test whether our data fits with the continuum model of user engagement proposed in our hypotheses. Finally, we examined the relationships between the four factors of user engagement and other relevant outcome variables, including attitudes toward the website and content and recall memory, in order to demonstrate predictive validity.
Participants
Participants (N = 263, average age was 23.85, 171 were females) were recruited from undergraduate classes at a large university in the United States. A compensation of US$15 was provided for browsing a website constructed for the study and completing a questionnaire. They were instructed to explore all the tabs and try to learn as much as they could from the website, without any time restraint on their browsing behavior.
Stimulus
In the first experiment, six prototype websites were constructed based on an online magazine story titled “Redwoods: Living Giants,” developed by NationalGeographic.com (http://ngm.nationalgeographic.com/2009/10/redwoods/redwoods-interactive). The six prototypes differed only in the type of interaction technique they offered to the users—Click-to-download, Mouseover, Drag, Slide, Zoom-in/out, and 3D carousel. The website showed a timeline of Redwoods where nine “hotspots” were placed to show additional information upon interaction using one of the aforementioned interaction techniques. Each of the six interaction techniques afforded a different action, so that together they capture the gamut of physical interactions with websites. While click-to-download (i.e., clicking the mouse) and mouseover (placing the cursor over the hotspots) involve relatively effortless interactions, slide and drag require more coordination ability to map mouse actions to changes on the screen—users explore the hotspots through sliding action or dragging a small object to the hotspots. Zoom-in/out and 3D carousel are more complex, requiring not only coordination but also repetitive actions to control the size or flow of information. Users zoom in and out small thumbnails by repetitively clicking them, or control the flow of rotating thumbnails by hovering over a specific image of choice. By affording variety, these six techniques were expected to generate variability in our measure of physical interaction. See Sundar et al. (2014) for more details about the methods used in this experiment.
The second experiment expanded this variability by employing combinations of interaction techniques, with a four (Click only, Click + Slider, Click + Slider + Drag, Click + Slider + Drag + Mouseover) by two (presence/absence of 3D carousel) factorial design. Eight stimulus websites were constructed based on the website of “Guitar/Bass Timeline (http://www.empsfm.org/flash/guitbass/index.html).” Each version contained a homepage with eight guitars and basses, and subpages with three hyperlinks showing detailed information of each instrument. The presence or absence of 3D carousel was manipulated on the homepage where participants interacted with either a 3D carousel of eight guitars/basses or a horizontal array of guitars and basses in the form of still images (non-3D carousel). A slider was embedded in the timeline so that users could access different instruments based on their time period, by simply moving their mouse along the slider. Drag was embedded in the “Specification” link wherein participants dragged a guitar-pick over specific points of interest (referred to as “hotspots” hereafter) on the instrument to obtain additional information. Mouseover revealed textual information when the user rolled the mouse over the image of each guitar. See Sundar, Bellur, Oh, and Jia (2011) for more details.
Measurement
All self-reported measures used a 7-point Likert-type scale. Descriptive statistics of the first experiment and the second experiment are reported separately in Table 1. The means and standard deviations from Study 1 are presented first in the paragraphs below with subscript 1. Study 2 descriptive data are included within the same parentheses, but are indicated by subscript 2.
Items Used in the Measurement Model of User Engagement.
Note. Values from the first experiment appear to the left of the slash (/). Values from the second experiment appear to the right of the slash (/).
Physical interaction was measured via time spent on hotspots (PI1) and the number of user actions with hotspots (PI2) as explained in the theory section. Time spent on hotspots was automatically captured by log data and averaged (M1 = 11.98 s, SD1 = 6.83, Min1 = 0, Max1 = 40.40; M2 = 7.42 s, SD2 = 4.47, Min2 = .03, Max2 = 20.65; PI1). This is a sensitive measure which captures only the time spent reading website content on each hotspot, rather than the time spent browsing the entire website. The log data also counted any mouse-based interactions with hotspots including click, slide, drag, and mouseover and calculated the average frequency of accessing hidden content (M1 = 1.72, SD1 = 1.2, Min1 = 0, Max1 = 6; M2 = 1.42, SD2 = .69, Min2 = .06, Max2 = 3.78; PI2).
Interface assessment was measured via three items. Natural mapping was measured by “The way that I used to control the changes on the website seemed natural” (M1 = 4.76, SD1 = 1.13; M2 = 4.93, SD2 = 1.39; IA1). Intuitiveness was measured by “My interaction with the website was intuitive” (M1 = 5.28, SD1 = 1.10, M2 = 5.34, SD2 = 1.30; IA2). Both were adapted from Witmer and Singer (1998). Perceived ease of use was measured by a semantic differential question adapted from Davis (1989), ranging from “difficult to browse” to “easy to browse” (M1 = 5.81, SD1 = 1.21; M2 = 5.68, SD2 = 1.41; IA3).
Absorption was measured via three items including “While browsing the website content, I was absorbed in what I was doing” (M1 = 4.19, SD1 = 1.51; M2 = 4.86, SD2 = 1.39; AB1), “While browsing the website content, I was immersed in what I was doing” (M1 = 3.99, SD1 = 1.44; M2 = 4.58, SD2 = 1.52; AB2), and “While browsing the website content, my attention did not get diverted” (M1 = 3.72, SD1 = 1.65; M2 = 4.59, SD2 = 1.62; AB3), taken from Agarwal and Karahanna (2000).
Digital outreach was measured via four items, including “I would bookmark this website” (M1 = 1.79, SD1 = 1.05; M2 = 2.37, SD2 = 1.58; DO1), “I would recommend this website to others” (M1 = 2.65, SD1 = 1.49; M2 = 3.32, SD2 = 1.79; DO2), “I would forward this website to my acquaintances” (M1 = 1.89, SD1 = 1.09; M2 = 2.73, SD2 = 1.69; DO3), and “I would visit this website again in the future” (M1 = 2.47, SD1 = 1.43; M2 = 2.87, SD2 = 1.71; DO4), adapted from Hu and Sundar (2010). Table 1 provides all items in the measurement model.
Attitudes toward the website were measured by asking participants to indicate how well eight items (comfortable, organized, involving, good, useful, coherent, sophisticated, and user friendly) describe the website (M1 = 5.10, SD1 = .90, α1 = .90; M2 = 5.12, SD2 = .93, α2 = .88), adapted from Sundar (2000).
Attitudes toward content were measured by asking participants to indicate how well four adjectives (concise, informative, insightful, and lively) describe the content of the story/feature on the website (M1 = 4.91, SD1 = .81, α1 = .87; M2 = 4.96, SD2 = .83, α2 = .86; Sundar, 2000).
Recall memory of content was measured by an open-ended question. In the first experiment, participants were asked to write down the names of different historical events that were mentioned in the website. The website contained nine events in all. The answers were coded by counting the number of correct historical events that they recalled (M1 = 4.56, SD1 = 1.88, Min1 = 0, Max1 = 8). In the second experiment, participants were asked to list any information that they could remember from the website content. The answers were fragmented into thought units first, following the method prescribed by Shen and Dillard (2009), and coded by counting the number of correct facts pertaining to the guitars described on the website (M2 = 2.32, SD2 = 2.07, Min2 = 0, Max2 = 10).
Results
We first tested internal reliability, convergent validity, and discriminant validity of the measurement model of user engagement with a CFA. Next, we examined our four-factor model of user engagement with a second-order CFA, and tested the user engagement continuum in Figure 1 with a structural equation modeling (SEM) analysis. In order to assess predictive validity of the user engagement scale, we further examined the scale’s relations with website attitudes, content attitudes, and recall memory of content.
Measurement Model of User Engagement
A CFA showed that the measurement model had acceptable fit indices: χ2 = 77.85 (df = 48, p < .01), root mean square error of approximation (RMSEA) = .05 (90% confidence interval [CI] = [.03, .07]), PCLOSE = .52, adjusted goodness-of-fit index (AGFI) = .93, comparative fit index (CFI) = .98, and Tucker–Lewis index (TLI) = .97. The factor loadings of individual items were all over .70 except for .65 for the time spent on hotspots (PI1), and .56 for ease of use (IA3). The results also showed strong correlations among the four factors. Physical interaction was significantly correlated with absorption (r = .26, p < .01), suggesting that the more users physically interact with the website content, the more they feel absorbed while browsing the site. Physical interaction did not show significant correlations with interface assessment and digital outreach, which will be further explained in the discussion section. Interface assessment showed significant correlations with both absorption (r = .35, p < .01) and digital outreach (r = .36, p < .01). Users who appreciated the naturalness, intuitiveness, and ease of use of the interface were more likely to feel absorbed while browsing and also socially distribute and manage the content after they browse the website. Absorption was significantly correlated with digital outreach as well (r = .43, p < .01), suggesting that the more users feel absorbed while browsing, the more likely they are to manage and distribute the website content.
Reliability
In order to assess internal reliability, we calculated the composite reliability as well as Cronbach’s alpha. As shown in Table 2, the only latent variable that failed to meet the .70 criteria of composite reliability was physical interaction. Other latent variables showed composite reliability greater than .70. Cronbach’s alphas for interface assessment, absorption, and digital outreach were .72, .87, and .89, respectively. For physical interaction, the bivariate correlation between time spent on hotspots and number of user actions was .46 (p < .01). 1
Cronbach’s Alpha, CR, AVE, MSV, and ASV of Latent Variables.
Note. CR = composite reliability; AVE = average variance extracted; MSV = maximum shared variance; ASV = average shared variance.
Zero-order correlation was calculated for physical interaction instead of Cronbach’s alpha.
Convergent and discriminant validity
We also calculated average variance extracted (AVE), maximum shared variance (MSV), and average shared variance (ASV) in order to assess convergent validity and discriminant validity of the four subscales (Hair, Black, Babin, & Anderson, 2010). The four subscales showed good convergent and discriminant validity. As reported in Table 2, the composite reliability for each subscale was larger than the AVE, showing that each latent variable has convergent validity. Also, the AVE of each scale was larger than both MSV and ASV, which demonstrates discriminant validity by showing that each latent factor was better explained by its own observed variables than by other observed variables from a different factor.
Four-Factor Model of User Engagement
In order to examine the four-factor model of user engagement, a second-order CFA was conducted with a higher-order latent variable representing overall user engagement. Goodness-of-fit tests indicated that the model with the higher-order latent variable fit the data quite well: χ2 = 82.21 (df = 50, p < .05), RMSEA = .05 (90% CIs = [.03, .07]), PCLOSE = .50, AGFI = .93, CFI = .98, and TLI = .97. All path coefficients were statistically significant at p < .05 except for a marginally significant path from physical interaction to the number of user actions (p = .07; Figure 2). Thus, our data supported the four-factor model of user engagement.

Four-factor model of user engagement (N = 263).
User Engagement Continuum
The user engagement continuum proposed in Figure 1 was examined with SEM analysis. Indices suggested a good model fit: χ2 = 88.46 (df = 50, p < .05), RMSEA = .05 (90% CI = [.04, .07]), PCLOSE = .34, AGFI = .92, CFI = .97, and TLI = .97. All paths were statistically significant at p < .01, except for the covariation between the two exogenous variables: physical interaction and interface assessment (Figure 3). As we predicted in H3, greater amount of physical interaction with the interface and greater assessment of the interface subsequently elicited more cognitive absorption with the content, which in turn predicted greater behavioral intention to manage the content for future usage and socially distribute it.

User engagement continuum (N = 263).
Predictive Validity
As explained in the theory section, previous literature suggested that user engagement can predict users’ attitudes (e.g., O’Brien & Toms, 2008; Klein, 2003; Li et al., 2002; Teo et al., 2003; Webster & Ho, 1997; Xu & Sundar, 2012) and learning outcomes (e.g., Evans, & Gibbons, 2007; Jacques et al., 1995; Salajan et al., 2009; Stoney & Oliver, 1999). Thus, we added attitudes toward the website, attitudes toward content, and recall memory to the structural model of user engagement and examined whether the four factors predicted these outcomes. We first started with a model where digital outreach, as the final component of the structural model of user engagement, predicts all three outcomes. Then we iteratively removed paths from digital outreach and added paths from physical interaction and interface assessment, guided by modification indices and theory. The final model included paths from (1) physical interaction to recall memory, (2) interface assessment to attitudes toward the website and content, (3) absorption to attitudes toward content, and (4) digital outreach to attitudes toward the website and content.
The model fit indices were acceptable: χ2 = 545.68 (df = 261, p < .001, χ2 / df = 2.09), RMSEA = .07 (90% CI = [.06, .07]), PCLOSE = .001, AGFI = .83, CFI = .91, and TLI = .90 (Figure 4). In general, the four factors turned out to be strong individual predictors of specific attitudinal and learning outcomes. Physical interaction was the only predictor for participants’ recall memory (β = .34, p < .001). The more time participants spent on the website and the more hotspots participants clicked, the more participants were able to recall the website content. On the other hand, interface assessment predicted both attitudes toward the website (β = .57, p < .001) and attitudes toward content (β = .35, p < .001) showing that the naturalness, intuitiveness, and ease-of-use of an interactive interface successfully enhanced participants’ attitudes toward the website and content. Similar to interface assessment, digital outreach predicted both attitudes toward the website (β = .42, p < .001) and content (β = .40, p < .001). Finally, absorption led to better website attitudes (β = .18, p < .05) showing that participants evaluated the website as more comfortable, organized, involving, good, useful, coherent, sophisticated, and user friendly.

Predictive validity (N = 263).
Summary of Results
The measurement model showed that the four subscales (Physical Interaction, Interface Assessment, Absorption, and Digital Outreach) had internal reliability, convergent validity, and discriminant validity. Both the four-factor model of user engagement and the structural model of user engagement showed acceptable fit. Finally, a predictive validity test showed that the overall index and the four subscales are significantly associated with attitudinal and learning outcomes as previous literature suggested (Evans, & Gibbons, 2007; Klein, 2003; Li et al., 2002; Salajan et al., 2009; Teo et al., 2003; Xu & Sundar, 2012). Interface assessment, absorption, and digital outreach predicted attitudes toward the website and/or content, whereas physical interaction predicted recall memory.
Discussion
Our goal was to go beyond concept explication, by offering both a comprehensive conceptual model and empirically verifiable measurement. In the sections below, we highlight the implications and contributions of our approach.
Four-Factor Model of User Engagement
Our data clearly show that the four-factor model of user engagement is reliable and valid. As can be seen in Figure 2, the factor loadings for individual items were greater than .60, with the exception of the factor loadings for the number of user actions and ease of use. Even these factor loadings are greater than .50. 2 Other indices such as Cronbach’s alpha and composite reliability show that the four factors reliably constitute the concept of user engagement, along with both convergent validity and discriminant validity.
The four factors that we have identified are indeed critical to defining user engagement, in that they capture different aspects of the concept. Consistent with our conceptual model, our data show that both the psychological dimension and the behavioral dimension are necessary to comprehensively define user engagement in the context of interactive media. From a behavioral perspective, markers of user engagement could take on the following forms: (1) clicking activity, (2) amount of time spent exploring various interface features and associated content, and (3) willingness to manage the content and distribute or share such forms of engagement socially. From a psychological perspective, user engagement is evidenced in the following ways: (1) subjective assessments of the interface itself (regardless of content) as being natural, intuitive, and easy to use, which in turn, could enhance (2) the perception of being absorbed in the content.
The current model of user engagement has overlaps with the user engagement scale by O’Brien and Toms (2010), which includes perceived usability (affective and cognitive responses to the system) and aesthetic appeal of the interface (perception of the visual appearance of the interface)—two dimensions that are conceptually similar with interface assessment in our model. The absorption factor in our model includes similar items with the felt involvement (feelings of being interested and having fun during the interaction) and focused attention (feelings of absorption and temporal dissociation) factors in their study. Digital outreach in our model also shows some parallels with the endurability factor in their model, such as users’ likelihood to return to and recommend the website. Compared with the previous study, one of the contributions of our approach lies in the consideration of both behavioral and psychological dimensions of user engagement. We have not only used self-reported items but also combined the number of meaningful clicks and time spent reading the content in our model, thereby highlighting the utility of behavioral markers in differentiating the role of user engagement in interactive versus non-interactive media.
A Process Model of User Engagement: User Engagement Continuum
Apart from proposing four critical factors, the continuum of user engagement, as seen in Figure 3, also allows us to decompose the process of engagement with the interface (system) and engagement with the content as two cumulative stages of user experience with any form of media. Our findings highlight the need to distinguish between interface-level engagement (physical interactions and interface assessment) and content-level engagement (absorption and digital outreach). Previous studies by O’Brien and colleagues also found that users’ initial perception of the usability or aesthetic appeal of the interface are different factors from users’ sustained involvement with the content that comes afterwards (O’Brien, 2011; O’Brien & Lebow, 2013; O’Brien & Toms, 2010). Extending this previous work, our model adds nuance and clarity by formally distinguishing between two different loci of engagement and the distinct user response associated with each.
Similar to our process model, O’Brien and Toms (2010) previously found that aesthetic appeal enhanced felt involvement and focused attention, which in turn increased the durability of the experience such as their likelihood to return to or recommend the website to others. Compared with their path model, the placement of physical interaction at the beginning of the user engagement continuum is a unique contribution of this study. Physical interaction and interface assessment are shown to independently predict absorption with content, which in turn predicts digital outreach. A reasonable inference is that physical interaction and interface assessment are two distinct facets of the concept at the point of initiating the engagement process. The former captures users’ exploration of the interface, whereas the latter reflects users’ initial evaluation of the interface features. Together, these initial experiences appear to dictate further absorption into media content and digital outreach (i.e., content management and sharing behaviors). As avid information processors, we first react to the “bells and whistles” (Sundar, Kalyanaraman, & Brown, 2003) present in rich media environments, by physically interacting with them and/or cognitively evaluating their ease of use, naturalness, and intuitiveness at the beginning phase of user engagement. Our results suggest that if this stage successfully engages us, we are more likely to systematically process the content and information enclosed in these rich media forms, including further absorption and outreach.
Past research on the concept of user engagement has focused more on defining and understanding its role as an intermediary variable in different communication processes (Busselle & Bilandzic, 2009) with scant emphasis on examining the initial factors—especially, medium or interface-related features—that could be triggering user engagement in the first place. Although other process models have examined individual-based pre-engagement factors (e.g., interest and awareness factors proposed by Napoli, 2011), our model is specifically focused on medium or technology-based attributes that addresses this gap by situating active physical interactions with various interface features as a precursor to subsequent components of user engagement. According to Sundar (2007), three characteristics of media systems, customization, multimodality, and interactivity, have the potential to serve as triggers and engage users even before they have had a chance to explore the content in great detail. This study offers a process-based view of this approach. The main argument of this study is that user engagement with peripheral cues belonging to the media system or interfaces, could trigger more systematic user engagement with content. Interactive websites can encourage users to actively explore the interface, leading to positive preliminary assessment of the interface, which can further enhance their absorption and behavioral intention to share the content with others.
In the language of dual-process theories of communication, such as the elaboration likelihood model (Petty & Cacioppo, 1986) and the heuristic-systematic model (Chaiken, Liberman, & Eagly, 1989), the user engagement continuum presented here will allow us to explore the possibility that peripheral interface features can lead to central processing, with inviting tools of interactive media interfaces prompting more systematic processing of content. Therefore, this research has implications for technology theories, which propose that aspects of the interface significantly influence user attitudes and behaviors by engaging them with interface features. By identifying specific interface features that are capable of triggering user engagement, we are also informing the design and development of newer media interfaces that are geared toward promoting sustained user engagement.
Predictive Potential of User Engagement
Furthermore, we have attempted to separate the antecedents and consequences of user engagement, while focusing on describing the key components that constitute engagement from its early (physical interactions and interface assessment) to advanced (absorption and digital outreach) phases. This clarity ensures that presumed causes of engagement (e.g., sensory appeal, control, challenge, novelty, curiosity, etc.) and presumed outcomes (e.g., flow, presence, enjoyment, attitudes toward website and content, recall memory, etc.) are not clouding the definition of the concept itself.
According to the predictive validity test, each aspect of engagement has its own value: Absorption can significantly enhance attitudes toward website, interface assessment and outreach aspects of user engagement can enhance attitudes toward the whole website as well as attitudes toward content, whereas physical interaction with the website leads to better recall of content.
Among the four factors, absorption measures most directly address the degree to which participants pay attention to media content. When participants feel absorbed and immersed during browsing, they appreciate the entire website as more comfortable, organized, useful, sophisticated, user friendly, and so on. User engagement measures proposed by previous studies also point out that focused attention or cognitive and affective involvement are key to inducing positive attitudes toward the system and further engagement with content, such as identification with characters (Busselle & Bilandzic, 2009; O’Brien & Toms, 2010).
The positive effect of physical interaction on recall memory implies that the two indicators—time spent on the website and the number of clicks on hotspots—are indeed useful to predict learning outcomes. Compared with click-through rates that counts every click, counting any mouse-based actions on specific hotspots on the website could be a better indicator of users’ attention to the content (Lipsman, 2012). This is also consistent with previous studies, which have noted that recall is more likely to be a function of attention to content, moderated by information processing capacity of users, rather than being directly influenced by interface features alone (Lang, 2000; Sundar et al., 2014).
It should be noted that interface assessment directly predicts attitudes toward the website and content without going through further examination of the website or the content. The face value of an interface, such as its naturalness, intuitiveness, and ease of use, can be so strong that it can determine users’ evaluation of the whole website and the quality of content delivered by the website. This is consistent with the MAIN model (Sundar, 2008) which argues that interface cues can dictate credibility assessment of the content by triggering various cognitive heuristics about its positive attributes. Future research would do well to identify which interactive tool triggers which heuristic, in order to better understand the predictive potential of interface assessment.
Limitations
Our study tested only a limited set of interaction techniques—six different mouse-based interaction techniques on the website and their combinations. Although mouse-based interaction techniques (e.g., click-to-download, mouseover, slide, drag, zoom-in/out, and 3D carousel) are popular and widely used in the current media environment, our model of user engagement should be re-examined in other interfaces such as gesture-recognition or haptic interfaces. In terms of measurement techniques, future studies should try to incorporate objective measures from industry, for example, a variety of time measures, cursor movement, and eye tracking data (Arapakis, Lalmas, Cambazoglu, et al., 2014; Arapakis, Lalmas, & Valkanas, 2014; Dupret & Lalmas, 2013), in addition to self-reported items and click data. As the IAB (2014, p. 13) working group observes, it is important to move beyond the “legacy of the click” to consider other factors of engagement that tap into cognitive and affective dimensions of the concept.
Future research should also consider that the data gathered in this study need to be interpreted in light of other factors, which our model does not consider, for example, content characteristics (e.g., appraisal of content quality, source credibility, message strength, etc.) and user characteristics in the form of individual differences (e.g., need for cognition, information processing capacity, technological competence).
Our sample consisted of college students only. Future studies ought to include more general adult samples and test the generalizability of our findings.
Conclusion
In sum, user engagement is a multi-faceted concept that captures the process of a media user’s progression from interacting with the interface physically to becoming cognitively immersed in the content offered by it and then onto proactively spreading the outcomes of this involvement, as well as managing content for future use. The four factors identified by this study show predictive validity for attitudinal and learning outcomes. Industry experts and research scholars are beginning to discover that sustained and continued user engagement has several outcomes that could be beneficial to all the stakeholders involved, most notably users themselves, as they continue to generate and disseminate content in various ways, resulting in some content becoming “viral.” Experts claim that building a dialogue with users and creating long-term social connections are the larger goals of user engagement research (Hanna et al., 2011). By explicating the concept and validating a four-part model, our study offers researchers and practitioners a concrete tool for empirically assessing user engagement when they create interactive media for promoting connections and dialogue with users.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the U.S. National Science Foundation under Grant IIS-0916944.
