Abstract
By providing a complete record of time use for a given population, time use studies enable investigators to test various hypotheses concerning that behavior. However, the large number and variety of activity combinations that are relevant in time allocation choices and, therefore, time use analysis, makes measuring or even fully identifying all of them impossible without the proper data mining tools. In this paper, we propose a framework for mining sequences of activities to capture more complex patterns than those currently available on how individuals organize their days. The proposed framework was applied to the American Time Use Surveys (ATUS) dataset to explore individual time allocation behavior, identifying sequences of activities that are frequent. For example, patterns such as the preferred activities that are performed before and after specific activities (such as paid work or leisure) are discussed in terms of their frequency. Such patterns are not easy to reveal using traditional descriptive analysis.
Introduction
The amount of time that individuals allocate to activities can be regarded as one of the most effective ways to ascertain the importance that people assign to their time, resulting from superimposing individual preferences on institutionalized frameworks and collectively imposed conditions. Total time assigned to activities is constrained by the total time available, which causes an implicit valuation of the time allocated to various activities. The resulting time use is closely related to levels of social satisfaction and overall well-being [57].
However, time allocation research would ultimately be misled if it neglected the fact that societal behavior is the result of choices made by millions of heterogeneous individuals with unique motivations, varying levels of information, and intrinsic purposes with which they carry out specific activities in a chosen order. The omission of patterns, sequences, and episode analysis results in inconsistent estimators, due to missing information. Sequence recognition and classification through data mining tools would allow the correct use of data to drive time use research in the desired direction. By obtaining information about the sequence of activities in which the members of society allocate their time, their individual and/or collective characteristics, and the expenses related to those activities, one could understand the structure of interpersonal relationships and family dynamics more completely [75].
Frequent pattern mining has been of prime interest in the data mining literature [50]. Originally developed for market basket analysis using association rules, the motivation was to understand and model which items are bought together with frequency that is not less than a given threshold. This idea was extended to sequential patterns to model whether items are purchased frequently in a particular order [50]. Mining sequential patterns has received increasing attention since it has been widely used in a broad range of applications, such as customer behavior modeling [92], market basket analysis, and shelf space allocation [4], as well as in medical diagnosis [21] and activity-based modeling [73]. Much research has been conducted on sequence data mining in the last decades among many disciplines, such as data mining, data base systems, information retrieval, biology and informatics, transport analysis, industrial engineering, etc. The area of sequence data mining has developed rapidly, producing a diversified array of concepts, techniques and algorithmic tools.
The main purpose of this paper is to address the issue of sequence analysis in time use research by presenting an adequate framework for mining sequences of activities. This novel framework unveils more complex patterns than those currently available on how individuals allocate time to different activities, studying a day as a sequence of activities. This strategy enhances the traditional analysis that focuses on describing activities without considering the order in which individuals perform them.
The remainder of this paper is organized as follows: In Section 2 a brief description of the most common purposes of time use research and the need for sequence analysis are presented. In Section 3 there is a discussion of the methodology. Section 4 is dedicated to presenting the data used and analyzing the results, while conclusions are discussed in Section 5.
Time use studies and their path to sequence analysis
The measurement of time use began relatively early in the twentieth century, spreading worldwide after World War II, and has created an overwhelming amount of literature contributing to the understanding of societal time allocation from various approaches.
Although there are many ways to classify this literature [57], in general, the objectives of most studies lie within these four distinctive (but not mutually exclusive) approaches:
conceptual-theoretical discussions, studies that deal with the understanding and explanation of diverse time allocation issues [1, 14].
data collection methodologies, studies that emphasize the appropriate ways time use information should be gathered, and the level of detail that should be encouraged to perform more complete analysis [39, 13, 80, 6, 58, 99].
analytical modeling frameworks, research encompassing an extensive area of research that contains diverse theoretical constructs representing the individual and/or household time allocation decision process by the use of models [10, 9, 8, 7, 68, 59] and
descriptive data analysis, which focuses on presenting organized ways of viewing time use data in terms of trends, patterns, activity duration, and other indicators [5, 38, 40, 41, 3].
For this last category, time allocation provides primary data on many kinds of social interaction and provides the basis for defining social groups by behavior [47]. This area of research can provide important data for studies of attitudes, values, cultural styles, and emotions by analyzing the types of activities being performed. It provides a basis for describing and analyzing both individual and societal behavior, by attempting to be comprehensive in recording the entire array of activities in which individuals engage, to understand the trade-offs between different possible allocations of time.
Understanding time allocation among individuals can be done in various ways, such as individual time allocation, patterns, sequences and episodes. One such strategy focuses on measuring the similarity of activity patterns. Various studies have been conducted on the measurement of the similarity of activity patterns [60, 87, 88, 56, 61, 91, 51, 52, 53, 54, 55, 58, 89].
By providing a complete record of time use by a given population, time use studies allow investigators to test different hypotheses concerning behavior. Various authors have pointed out that time allocation analysis does not permit drawing conclusions about how time might be spent, only about how it is spent [47]. However, by presenting – in greater or lesser detail – the array of alternative activities in which people engage, it is possible to show how a change in time spent on one activity will require a change in time used for some other activity, revealing the relative value individuals assign to them.
Most of the time-use research, however, is focused on the individual’s overall time allocation for specific activities, neglecting the analysis of an episode-based framework and, therefore, overlooking the explicit value of performing activities in a precise order within the period analyzed, i.e. the sequence. This overlooking of attributes and sampling of alternatives is quite common in time-use studies. These simplifications are required, for example, in models of labor supply, where the number of hours worked is summed for the estimation of the model and does not consider the different structures of the time blocks in which the individual decides to work. These simplifications are also required in leisure-time analysis, where there may be many different patterns in the trade-off between leisure activities and travel/work linking two sequences that may provide a better understanding of the value the individual gives to the activities performed. In the same way, these simplifications are also necessary in activity-based research since the number of potential combinations of activities, schedules, duration, and participation choices may be enormous and heterogeneous.
However, this omission becomes a problem when those sequences of episodes and the availability of activities within those sequences are relevant factors in the overall time allocation decision. To improve the analysis of individual time allocation, time-use research requires a detailed representation of numerous quasi-unique alternatives (i.e. sequences of activity-episodes) that are generally omitted.
Sequence analysis could, if applied properly, complement the methods used to collect, present, and analyze temporal data. For example, sequence analysis could analyze the issue of synchronized demand, in which a large group of individuals perform the same sequence or combination of activities leading to both goods consumption and performing other activities, such as drinking alcohol while watching sports [101, 81], which could lead to conducting illegal activities afterwards [77]. On the other hand, sequence analysis could also explore the complementary issue of non-synchronized demand, where a large group of individuals performing different activities could lead, for example, to public transportation congestion on weekend day trips [70], or to peaks in the need for emergency services for weekday and weekend crashes [102]. Finally, sequence analysis could also explore the existence of non-demand, where a large group of individuals perform the same activity – such as sleeping at night – leading to a decrease in the number of activities and goods consumed during those times.
There is one area of research which has profoundly analyzed sequences: Transport. Models of discrete transport mode choice were first justified by Train and McFadden [97] based on the approach of Becker [22], who proposed a theory of time use in which the utility of individuals depends on commodities and time consumption and where time has a unique value given by the wage rate. Since then, many theories have been developed either trying to enrich the model proposed by Becker [33, 30, 62, 46] or adapt it to the study of specific aspects like travel time, including the pioneering contributions of Oort [84], which includes travel time in the utility function, and Small [94], which includes the time of the trip as an important decision variable. Later, transport models known as “activity based” were developed from the pioneering work of Kitamura [68] to Bhat et al. [11]. The focus of these models is to understand the context of the trip decision, recognizing that the activity structure, for the individual or the household, is distributed in space and time, generating the different types of travel [57]. In short, Activity-based travel demand models predict travel sequences on a day for everyone in a study region. These sequences serve as important input for travel demand estimate and forecast in the area [73].
Activity-based travel demand models view travel as demand of activity participation. In this modeling framework, travel is analyzed in relation to daily activity behavior, the context of land-use and transportation networks, as well as personal background information (e.g., socioeconomic conditions). Travel surveys, which collect full daily activities and travel of a small sample of individuals during one or a few days, are also required as training sets. Once the models are built, they can generate travel sequences (i.e., chains of activities and travel conducted by a person during a day) of each person in the study area using the Monte Carlo simulation approach. The individual travel sequences are then accumulated across the entire population, resulting in an origin-destination (OD) matrix. In this matrix, each element describes the number of trips between each pair of the corresponding locations of the area. This matrix is further assigned to the road network based on a traffic assignment algorithm, and the number of assigned trips on each road can subsequently be used as important input for mobility-related studies in the region (e.g., travel demand prediction, emission estimate, and transport policy evaluation).
Despite the comprehensive process of activity-based models, a reliable method has been absent to validate the simulated travel sequences [73]. Traditionally, the model results are examined at both internal and external stages of the development process. In the internal validation, the statistics aggregated from the simulated sequences (e.g., the average number of trips per day) are compared with those drawn from the expanded survey data that is not used as the training set of the model development but usually collected in the same survey period. Thus, the internal validation suffers from several limitations that are intrinsic to the shortcomings of the survey data. In contrast, the external validation indirectly evaluates the model results at the traffic assignment stage. The assigned traffic volumes are compared against data from external sources (e.g., traffic counts) on numerous specified roads. However, good outcomes of the compared results might have resulted from the extra processes of OD matrix aggregation and traffic assignment, thus providing no convincing evidence of the accuracy of the model itself.
In this paper, a novel framework for mining sequences of activities is presented for time-use analysis. The proposed strategy is introduced in two steps: First, the Generalized Sequential Pattern (GSP) algorithm [95] is presented. This method is used to extract the sequences of activities performed frequently by individuals within a day. Sequence mining analysis has been widely studied in the literature; we refer the reader to the literature reviews by Febrer-Hernández and Hernández-Palancar [34] and by Bhawna et al. [12]. Next, the applied framework is presented as an experimental setting for the proper use of the algorithm in time-use analysis.
Methodological framework: Sequence mining
The goal of the GSP algorithm is to find frequent sequential patterns in databases. A frequent sequential pattern in our scheme is a subsequence that occurs frequently in the respondent’s data set, such as respondents declaring first that they woke up, then went to work, then did a leisure activity like watching TV, and finally went to bed to sleep again.
Formally, a data set D of data-sequences is constructed, each data-sequence being a set of transactions (activities), ordered by increasing times of occurrence. It is assumed that the activities cannot occur at the same time [95]. The idea is to select the sequences from D that are frequent. The support for a sequence corresponds to the fraction of the data-sequences in D that contain this sequence. A given sequence is contained in a data-sequence in D if this sequence is a subsequence of the data-sequence.
Consider a respondent who declared as a day: sleep, wash, dress, travel to/from work, paid work, travel to/from work, watch TV, sleep. This would be a data-sequence in D. The sequence s1
Notice that it is irrelevant if the reported activities do not match the sequence strictly. For example, the previous respondent reported more activities between “wash” and “travel to/from work”, but it is considered to be a “hit” for the support count anyway. In other words, a sequence like s1
The GSP algorithm finds the sequences that are frequent using a simple iterative strategy that stems from the a priori algorithm for association rules [2]. Notice that finding and evaluating all subsets of sequences in D is a combinatorial problem, and therefore the exhaustive search of all combinations of sequences is computationally prohibitive if the number of activities and respondents is large.
Given a user-defined minimum support as input, the GSP algorithm works as follows: First, the method performs a first pass over the data, identifying sequences of size one that are frequent (all single items with minimum support). In the next pass, GSP creates frequent sequences of size two by combining the sequences of the previous step and computing their support, filtering out those sequences that do not reach the minimum threshold. These frequent sequences of size two are used to generate frequent sequences of size three, and so on, until no more frequent sequences are found.
The main advantage of this algorithm is that it scales well on large data sets thanks to its effective candidate generation step. We refer the reader to [95] for a detailed discussion and formalization of the algorithm.
Applied framework for analyzing sequences of activities
To better understand the sequences detected and to provide a coherent and complete discussion, four steps were taken. The first step deals with activity classification. Given that the number of activities reported by individuals is 69, it would seem preferable to group certain activities that serve the same main purpose.
It is worth noting that classifying activities into certain categories can be conducted in many ways, to serve different purposes [86, 93, 22, 67]. However, we believe that our proposal covers the intention of our analysis adequately.
For this study, we classified the activities into the following groups:
Work: Activities that bring income to the individual or that are exclusively related to that activity. This activity is truly important given that it is the only activity that can provide monetary income, which allows individuals to buy market goods – the only source of utility recognized by traditional economics – and to undertake activities. The main idea regarding the valuation of work was that the real cost of working time could also be measured by the activities the individual postponed because of that work, and not only by the wage rate. Unpaid labor: Activities that the individual can delegate to a paid worker without losing their intrinsic value, such as laundry, cooking, cleaning. In general, unpaid work activities can be separated into three main activities: household maintenance – or domestic labor –, care for others (mostly childcare), and volunteering, with the majority of research focused on the first. Given that unpaid work was drawn from labor supply theory, domestic labor research adopted almost the same framework but with a clear emphasis on how to value work conducted inside the household. Additionally, the presence of children appears to be an important issue when allocating individual and/or household time. Leisure: Activities for which the individual allocates more time than the minimum required, such as going to the movies, to a bar, to a sports event, or to a party. Beginning with the seminal work of Mincer [78] and Becker [9], and further influenced by labor supply models, there has been an overall acceptance that time can be divided conceptually into two activities: work and non-work (or leisure). According to Feldman and Hornik [35], despite the apparent simplicity and benefit of leisure defined in terms of freely chosen, intrinsically satisfying activities, theorists do not agree about what types of non-work should be defined as leisure, because the freedom of choice of many non-work activities is determined subjectively [16, 63, 64, 32, 46]. Activities defined as leisure are the consequence of subjective perceptions on the part of each consumer. For example, cleaning the pool during the summer could be unpaid work by one individual, and a leisure activity by another. Classifying activities is not a trivial matter, and leisure, as a category, presents an extra difficulty as was mentioned above: subjectivity [65, 66, 82, 23]. The literature shows that there is no clear distinction or agreement on a unique definition for leisure, so the activities classified as such are determined a priori by the researchers using their own conceptions. Thus, leisure can be characterized in many ways. In this study, we will use the following definition: “all activities that we cannot pay someone else to do for us and that we do not have to do at all if we do not wish to” [17]. Tertiary activities: Activities that the individual must perform and for which less time cannot be allocated due to technical restrictions, such as eating and personal care. After deciding on an adequate and operational definition of leisure from the previous “non-work” category, the adequate definition of tertiary activity can be drawn: “those things that we cannot pay other people to do for us, but that we must do at least some of” [17]. According to this definition, this sub-category includes activities such as eating, sleeping, personal care, and traveling. On the one hand, eating, sleeping and personal care are among the most basic and necessary human activities since they are fundamental to survival, i.e. they take minimal amounts of time to perform, and one cannot pay another individual to do them for us. Travel, on the other hand, is a special type of tertiary activity, an “intermediate activity”, that is needed to perform other activities. Within this tertiary activity category, there are two activities that have been covered extensively in the literature: transportation and sleep. Therefore, we chose to categorize them separately due to their importance. Sleep: Sleeping and naps Sleep is a very special activity to analyze. It is a necessary activity that has vast implications for multiple aspects of everyday life, consumes a large amount of non-working time, and affects the individual’s health, productivity, appetite, physical and mental performance, among other functions. This specific category deals with both night sleeping time, and day sleeping time (naps). Travel: Transportation needed to perform other activities at the destination. The specific kind of transportation activity we are referring to is that needed to carry out other particular activities. Individuals who need to perform activities personally in another location, such as working or studying, require traveling to do so, and cannot delegate those trips to another person. Furthermore, there are technical constraints that impose a minimum time required by individuals for traveling, such as the maximum legal speed limit. Study: Regular schooling, education and homework.
As its name indicates, this category deals with educational activities.
The second step deals with the analysis of the daily cycle. Regardless of the specific hour of the day, we believe the existence of a daily cycle is of the utmost importance, with an explicit beginning and end, which can be represented by the individual waking up and going to sleep. Therefore, to provide an analysis that includes the entire daily cycle, the sequences studied are the ones which – regardless of the number of events – start and end with the activity Sleep. Thus, all sequences reported in the next section, implicitly include both the first and last activity of each sequence: Sleep.
The third step comprises the period of observation. The detection of sequences depends both on the percentage of individuals who performed certain activities in a specific order (support), and the number – and type – of days covered in the analysis.
For this study, three periods of observations were analyzed:
All days: The sequences are detected if they occur on any day of the week. Work days: The sequences are detected only if they occur between Monday and Friday. Therefore, if a sequence occurs on the weekend, it is not included here. We chose to restrict the analysis to the work days of the week to detect sequences that did not have enough support when analyzing the complete week (e.g. paid work). Weekend days: The sequences are detected only if they occur between Saturday and Sunday. Therefore, if a sequence occurs on work days, it is not included here. Similarly, we chose to restrict the analysis to the weekend to detect sequences that did not have enough support when analyzing either the complete week or the work days (e.g. leisure time).
The final step deals with the focus on the issue of the position of the activity within the sequence, given that some activities are linked over time. It is our purpose to detect and analyze these types of sequences:
Some activities have to come before or after others (e.g. traveling to work before working) Some activities often come before or after others (e.g. eating before or after alcohol consumption)
Some activities rarely come before or after others (e.g. exercising after alcohol consumption)
In general, some sequences are tightly locked together, others are more flexible and can be done at different times, such as domestic chores on weekend days (with more time to do them) or on working days (to allow resting on the weekend).
Tightly locked sequences make ‘blocks’ that structure the rhythm of the day. These arrangements have distinctive features both of timing and duration.
The dataset
We used data from the Multinational Time Use Study. The MTUS is an ex-post harmonized cross-time, cross-national, comparative time-use database, coordinated by the Centre for Time Use Research at the University of Oxford.1
This dataset is available to download after registration in
The complete list of activities can be seen in Table 1.
The American Time Use Survey (ATUS) has been the source of interesting economic/econometric research in recent years.
Flood et al. [36] used Multinomial Logit Latent Class Analysis to discuss eight daily temporal pathways and associations with individual characteristics for drawing on four broad types of time (contracted, committed, necessary, and free). Their analysis highlights the variations and similarities across pathways, the impact of paid work in structuring daily life, the social patterning of sleep and leisure, and socio-demographic profiles of the pathways of working-age Americans.
Das et al. [29] used the American Time Use Survey (ATUS) to characterize how different consumers in the US might use Autonomous Vehicles (AVs). Their approach was to identify sub-groups of the population likely to benefit from AVs, and compare their activity patterns with an otherwise similar group (working individuals who drive to work with long commutes, working individuals with long commutes who take public transportation, and elderly retired people). The authors state that the economic, environmental, and social implications of AV are very difficult to predict but are expected to be transformative. The contribution of their work is that it utilizes time-use surveys to suggest how AV adoption could induce lifestyle changes both inside and outside of the vehicle.
Dong et al. [31] extended the traditional analysis of leisure activity participation by including leisure activities that require the use of a PC. They studied the substitution effects with both in-home and out-of-home leisure activities, and the time budgeted to each of them. Results show that there is little substitution effect between leisure with PC and the relative time spent on it, and in-home and out-of-home leisure episodes. Households with more children and full-time working parents are more likely to engage in in-home and PC related leisure activities (especially on the weekends).
Cornwell [25] analyzed the affinities between the network framework and the sequence framework by showing how several key network concepts – including network diameter, size, density, centralization, and homophily – correspond to key sequence concepts, such as sequence length and the extent
List of 69 activities reported by ATUS – grouped by activity category
List of 69 activities reported by ATUS – grouped by activity category
of variation in the popularity of transitions between different sequence elements. He demonstrated the sequence-network approach via an analysis of the structure of the network of activity pathways that are followed by individuals in different age groups of the ATUS. Results show that older adults’ pathways are less complex, as they involve fewer activities at different times, fewer transitions between different activity-times, are more centralized around fewer and more dominant sequence pathways and involve less switching between different activities.
Freedman et al. [37] used national time diary data based on the ATUS to explore whether there are signature care patterns throughout the day and whether these care patterns have implications for the wellbeing experienced by caregivers. Results show that caregiving follows a rollercoaster pattern over the day, peaking at mealtimes. Sequence analysis suggests five distinctive caregiving patterns, which vary by both the demographic characteristics of the caregiver (gender, work status), and the care arrangement type (relationship to recipient, and whether or not the caregiver is sole to the recipient). The 40% who provide only marginal assistance of about one-hour report lower wellbeing experienced than the 28% who provide sporadic assistance with a mix of activities for about two hours.
To simplify the analysis, all the sequences found will be discussed following the activity classification presented and used in the previous section: Paid Work, Unpaid Labor, Leisure, Tertiary Activities, Sleep, Travel, and Study, although the latter did not appear in any of the sequences.
Highest support 3-activity sequences – by activity category and period of observation
Highest support 3-activity sequences – by activity category and period of observation
The minimum support threshold used by the GSP algorithm was set to 5%. We found empirically that this value represents an adequate compromise between the length of the activity sequences and the size of the candidate sequences set for the analysis. Tables 2–5 present a summary of the results found. Table 2 shows the sequences with the highest support, classified by activity category, and periods of observations. Results coincide with a 3-activity sequence for all categories. Table 3 shows 4-activity sequences with the highest support for every category and period of observation. Two sequences are reported per period and category, one with the studied activity before an accompanying activity, and one with the studied activity after the accompanying activity; both of them occurring within a Sleep-x-Sleep
Highest support 4-activity sequences – by activity category and period of observation
Longest sequences with the highest support – by activity category and period of observation
framework. Table 4 shows the longest sequence per period of observation and type of activity, selected from a pool of sequences that report at least one activity of that category present in the sequence. For this table, the sequence with the largest support among those of the same length and support greater than 5% are selected. Finally, Table 5 shows the sequences with the largest number of activities of the same category. For Tables 4 and 5, the first and last activity (Sleep) are not shown.
Sequences with the greatest number of activities of the same category — by activity category and period of observation
Out of the eight activities classified under paid work, the one that was carried out by the greatest number of individuals – regardless of the period of observation – is “paid work not at home” (see Table 2). A difference occurs when differentiating by period of observation. One can see clearly that paid work not at home has a higher relevance when limiting the analysis to work days on which 44.3% of the individuals reported Sleep-Paid work not at home – Sleep on any given day of the week. This is heavily contrasted with the weekend framework where only 11.3% of the individuals reported working on Saturday or Sunday. This category is one with the lowest support of the seven categories, presenting the fact that more people work than take naps, but as many do unpaid labor, leisure and tertiary activities, and travel.
When analyzing a 4-activity sequence, the sequences detected are the same regardless of the period of observation; however, the support reported is, as expected, higher within a work day period, and the lowest on a weekend framework.
The highest support sequences found are:
Sleep – paid work not at home – meals or snacks in other places – Sleep (37.9% on work days) Sleep – travel to/from work – paid work not at home – Sleep (39.2% on work days)
The importance of these sequences resides in the acknowledgment that in order to work outside of the home, “traveling” and “eating” are the most common accompanying activities.
The longest sequence present in this category had 10 events (starting and ending with sleep) and occurred when working days were analyzed. This sequence has 6.6% of support and is described as follows:
Sleep – wash, dress, care for self – travel to/from work – paid work not at home – meals in other places – paid work not at home – travel to/from work – meals in other places – watch TV – Sleep
This can be seen as a common day of a worker, who wakes up, gets ready for work, goes to work, and then relaxes back home, eating and watching TV.
Finally, when analyzing repetitive activities within a sequence, we can see that paid work presents the longest sequence on work days, with a support of 8.7%. Out of the eight paid work activities, only “paid work not at home” was present in such a sequence. According to Table 4, individuals reported 4 distinct paid work episodes within their day.
Unpaid labor
The characteristic of higher support on work days present in paid work is replicated by unpaid labor. For that activity classification, out of the 16 activities labeled as unpaid work, the one present in the sequence with the highest support is “food preparation, cooking”, independent of the period of observation. Over 50% of the individuals reported that, within a daily cycle, they performed the sequence Sleep – Food preparation, cooking – Sleep; with almost the same support for all periods. In terms of comparison, we can see that unpaid labor has higher support than work and sleep (naps), which can be readily understood since more people focused on preparing meals than working (not everybody in the sample worked) or taking naps.
When analyzing a 4-activity sequence, and as with paid work, the sequences detected are the same regardless of the period of observation; however, the support reported varies with the period.
The highest support sequences found are:
Sleep – food preparation, cooking – meals or snacks in other places – Sleep (46.9% on work days) Sleep – shop, person/hhld care travel – purchase goods – Sleep (43.0% on weekend days)
The importance of these sequences resides in the finding that, among all unpaid labor activities, those that deal with the consumption of goods (purchase, cooking and eating) are the most common activities.
The longest sequence with the highest support present in this category had 9 events (one less than paid work) and occurred in the analysis of weekend days. This sequence has 7.7% of support and is described as follows:
Sleep – shop, person/hhld care travel – purchase goods – shop, person/hhld care travel – purchase goods – shop, person/hhld care travel – meals in other places – watching TV – Sleep
As one can see from the previous category, this sequence is significantly different from those that occur within a paid work framework. In this case, the sequence can be seen as a common day of shopping for the household and then relaxing back home, eating and watching TV, which is the same end-of-day scenario as in paid work.
Finally, in terms of repetitive activities, we can see in Table 5 that unpaid labor does not have just one activity repeated many times. Rather, it presents different focuses depending on the period of observation. What is noticeable is that the activity, “supervise, accompany, other childcare” is present only on work days and is part of a sequence in which there are three different episodes of childcare. This departs from weekend days and all days, which focus on food preparation and setting the table.
Leisure
Regarding leisure, which presents the longest list of choices (29 out of 69 activities), something noticeable happens: The only leisure activity present in any sequence is, “Watching TV, video, DVD”. In terms of the sequences with the highest support that include any leisure activity, the sequence Sleep – watching TV, video, DVD – Sleep has a support that ranges from 74% to 78%, showing that more people watch TV rather than work, prepare food, take a nap and travel. Another interesting fact is that now the period comprised only of Saturday and Sunday presents the highest support of the three periods, contrasting with paid work and unpaid labor. This could lead to the interpretation that people do leisure activities more often on weekends than on work days.
When analyzing a 4-activity sequence, and as in the two previous categories, the sequences detected are the same regardless of the period of observation; however, and going against results found for paid work, here the highest supports are detected on weekend days.
The highest support sequences found are:
Sleep – watch TV, video, DVD-watch TV, video, DVD – Sleep (44.9% on weekend days) Sleep – meals or snacks in other places– watch TV, video, DVD – Sleep (67.8% on work days)
Results of these sequences show that the only leisure activity present in these sequences is watching TV, video, and/or DVD, as a studied activity and as an accompanying activity. Furthermore, eating is shown to be the other activity present in the highest supported 4-activity sequence.
The longest sequence with the highest support is the same as the one presented in unpaid labor. It has 9 events and occurred in the analysis of weekend days, with the leisure activity being watching TV:
Sleep – shop, person/hhld care travel – purchase goods – shop, person/hhld care travel – purchase goods – shop, person/hhld care travel – meals in other places – watching TV – Sleep
Finally, when analyzing repetitive activities within a sequence in Table 5, we can see that leisure activity presents the longest sequence on weekend days, with a support of 8.3% and the activity being “watching TV”. According to the results, individuals reported 4 distinct TV episodes within their day.
The fact that watching TV is the only leisure activity present in the sequences could mean that individuals tend to enjoy such a passive leisure activity as watching TV after the effort made throughout the day. This is corroborated by several studies. For example [27] analyzed whether the quality of any experience was more influenced by whether a person was at work or at leisure. Results showed that regardless of the quality of the experience, respondents are more motivated when at leisure than at work. The most common leisure activity reported by their study was watching TV. Their results show that workers need to recuperate from the intensity of work in low-intensity, free-time activities characterized by relaxation, even though most of these activities might be unsatisfying, uncreative, etc. This would explain why people prefer to watch TV, try to sleep, or in general vegetate at home, even though they do not enjoy doing these things. Perhaps they are so exhausted from the stimulation at work that they lack the energy to structure and enjoy their free time.
The same idea was presented many years later [24] analyzed television viewing among workers and found that television viewing is positively correlated with work hours across countries. The reason, according to the author, is that individuals possess a certain amount of energy and that they usually expend it during their day. Among the activities requiring a high amount of mental energy are work and some forms of leisure activities (e.g. debating and playing cards), while watching television requires a very small amount of concentration. Thus, workers who have less energy left for leisure activities by the end of the day turn on the screen and avoid more demanding activities [24].
Other studies have focused on studying the relationship between watching TV and other leisure activities. For example [28], found great differences in how teenagers felt when viewing television as opposed to how they felt when involved in sports and games. The paradox presented by the results of this study has since been replicated in several different contexts; namely, that although teens experience active leisure as a much more positive experience than watching TV, they spend almost 10 times as many hours in the less enjoyable activity [26] compared the experience of television viewing to other leisure activities among adults, and [72] found significant differences between listening to music and watching television [45] showed that adults enjoy leisure activities only when these are perceived to be freely chosen, and [20] found similar differences among teenagers involved in various sports: the same sports were experienced more positively when controlled by the teenagers than when controlled by adults.
Tertiary activities
The Tertiary Activities category presents the sequences with the highest support of all activity categories even though it has only four activities within its framework. The sequences reported include, for all the periods of observation, Sleep – Meals or snacks in places other than work or school – Sleep, with almost the same high support regardless of the period of observation (85%). This could be a verification of common sense given that by nature humans must eat every day.
When analyzing a 4-activity sequence, and as in all the previous categories, the sequences detected are the same regardless of the period of observation; however, the support reported varies with the period.
The highest support sequences found are:
Sleep – meals or snacks in other places-watch TV, video, DVD – Sleep (67.8% on weekend days) Sleep – wash, dress, care for self – meals or snacks in other places – Sleep (62.2% on work days)
Results of these sequences show that eating and personal care are the most frequent tertiary activities within these sequences. Furthermore, the first sequence is the same as the one detected in the leisure category, with the only leisure activity present in these sequences being watching TV, video, and/or DVD.
The longest sequence with the highest support is the same as that presented in unpaid labor and leisure activities. It has 9 events and occurred when weekend days were analyzed, with the tertiary activity being eating meals:
Sleep – shop, person/hhld care travel – purchase goods – shop, person/hhld care travel – purchase goods – shop, person/hhld care travel – meals in other places – watching TV – Sleep
Finally, in terms of repetitive activities, we can see in Table 5 that, as with unpaid labor, tertiary activities do not have just one activity repeated many times. Rather, it presents different focuses depending on the period of observation with a support ranging from 6% to 15%.
Sleep
Sleep is an interesting activity to analyze. Results show that between 19% and 21% of individuals reported taking a nap within a daily cycle. As expected, and following the same trend as leisure, the highest support is present only in the analyses of weekends (21%), as compared with work days (19%). This could be interpreted as individuals preferring (or having the time) to take naps on the weekend. This category has the lowest high support out of the 8 categories. However, the percentage of individuals reporting the sleep sequence on weekends (21%) is higher – almost double – than the percentage of individuals reporting paid work not at home on weekends (11%).
In the analysis of a 4-activity sequence, as with all the previous categories, the sequences detected are the same regardless of the period of observation; however, the support reported is, as expected, higher within a weekend period, and the lowest on a work day framework.
The highest support sequences found are:
Sleep – Sleep (nap)-watch TV, video, DVD – Sleep (12.2% on weekend days) Sleep – meals or snacks in other places – Sleep (nap) – Sleep (16.4% on weekend days)
Results of these sequences show that besides taking a nap, eating and watching TV are the activities most present within these sequences, presenting a leisure/rest framework on a weekend basis. However, it is noticeable that, among all the categories, taking a nap has the lowest support of all sequences reported in Table 4.
Another interesting insight appears in the analysis of sleep (i.e. nap time in this context). Sleep is the only category that does not have a sequence shared with another category, and that has the lowest support of all the activity categories. In addition, the highest supported sequence on work days is the only sequence present in Table 5 that incorporates the activities: Adult care, and Child/adult care travel. Taking a nap is a special occasion for many individuals, especially for workers, so it is no surprise that the highest support is seen when observing the weekend, and the lowest support when studying work days.
Research is continuously linking the quality and quantity of sleep with its impact on many aspects of societal behavior, and linking individuals’ characteristics with their sleep time allocation. Sleep has been an important topic in the medical and biological sciences, with studies focusing on cognitive performance, alertness, memory, decision making, reasoning, problem solving and accidents [100, 98], obesity [18, 74, 42, 96, 83], and overall health [43, 44, 48, 49]. For example, studies regarding sleep deprivation [19], found that sleep serves the purpose of physical and mental restitution, making it a necessary activity. In terms of the physiological links between lack of sleep and chronic ill-health, the evidence suggests that a severe lack of sleep alters liver function adversely [76]. Sleep also plays a restorative role in the immune system and the lack of sleep impairs it [85, 71], while decreased sleep duration or quality may also increase the risk of diabetes [69].
For centuries, sleep has been considered to be a passive state, a simple suspension of activities that individuals must undertake on a daily basis to give them the necessary physical and mental conditions to keep performing other activities when awake. Furthermore, and even though sleep takes more time, on average, than any other single activity (individuals spend a third of their lives asleep), it is generally viewed only as a mandatory activity from which to take time if needed, i.e. a trade-off activity. However, the true and active nature of sleep is that it is not only fundamental to well-being, health, and productivity, but it also impacts significantly on how individuals allocate time to other important activities such as work and leisure.
When specifically focused on taking naps, studies have analyzed its relationship to alertness, performance, and as a recuperative activity for sleep restriction. For example, [79] focused on the effects of a nap regimen on nocturnal sleep, circadian rhythms, and evening alertness, and on performance levels in the healthy elderly. Their reason for limiting their sample to elders was not only that, a priori, researchers thought that older adults are less likely to be prevented by employment from taking naps, but also that there are age-related changes in sleep and circadian rhythms that may predispose older people to take an afternoon nap. Their results show that, although there appear to be some negative consequences of the nap for nocturnal sleep, mostly in terms of reduced sleep efficiency and earlier waketimes, there may be some beneficial effects too. Objective measures of evening sleepiness, such as evening performance, show a significant improvement with afternoon napping.
Travel
Travel is an interesting activity given that it has a direct relationship with other activities that require moving from one place to another to be performed. The Travel category presents seven types of trips. The sequences reported with the highest support include, for all periods of observation, Sleep – shop, person/household care travel– Sleep, with the highest support on weekend days (59%). A type of day comparison is not straightforward: weekend days could have the highest support for traveling because workers have more free time to perform other activities, therefore potentially having the need to leave the house and travel. Nevertheless, if the individual chooses to stay home and, for example, watch TV, most trips could be focused on work days on which a minimum of two is present: traveling to/from work. This commuting trip was expected to appear on work days, however shop, person/household care travel dominated all the periods of observation.
In the analysis of a 4-activity sequence, results show that in the analysis of travel, there is only one sequence with the highest support across all periods; furthermore, the support reported is, as expected, higher within a weekend period, and the lowest on a work day framework.
The highest support sequence found is:
Sleep – shop, person/hhld care travel-shop, person/hhld care travel – Sleep (49.8% on weekend days)
This shows that the most important travel activity within this framework is personal and household care travel, an activity previously detected in the tertiary activity category.
The longest sequence with the highest support is the same as the one presented in unpaid labor, leisure and tertiary activities. It has 9 events and occurred when weekend days were analyzed:
Sleep – shop, person/hhld care travel – purchase goods – shop, person/hhld care travel – purchase goods – shop, person/hhld care travel – meals in other places – watching TV – Sleep
Finally, in the analysis of repetitive activities within a sequence, we can see that travel presents the highest support on work days (8.7%), and the longest sequence on weekend days (5.8% of support) with 6 episodes of shop, person/hhld care travel.
Other insights
Table 1 presents 69 different activities to analyze and, so far, we have focused on the sequences with the highest support for categories previously discussed, from paid work to travel. However, interesting patterns can be found by focusing on the differences in sequences with the same activities, or in activities with low support.
Wash, dress, care for self
It is quite evident that people take showers and have their breakfast at the beginning of the day. However, there are respondents that eat first and wash themselves later, and others who wash themselves first and eat later.
Sleep – wash, dress, care for self – meals or snacks in other places – Sleep (47.03% on any day of the week) Sleep – meals or snacks in other places – wash, dress, care for self – Sleep (45.01% on any day of the week)
meals or snacks in other places
Here, individuals share their leisure time while watching TV, videos, or DVDs, and also an important percentage do the cooking, with a support of 41.9%. The sequences with the highest support are:
Sleep – meals or snacks in other places – watch TV, video, DVD – Sleep (64.7% on any day of the week) Sleep – meals or snacks in other places – meals or snacks in other places – Sleep (49.7% on any day of the week) Sleep – wash, dress, care for self – meals or snacks in other places – Sleep (47.03% on any day of the week) Sleep – meals or snacks in other places – wash, dress, care for self – Sleep (45.01% on any day of the week) Sleep – food preparation, cooking – meals or snacks in other places – Sleep (41.9% on any day of the week) Sleep – shop, person/hhld care travel – meals or snacks in other places – Sleep (41.2% on any day of the week)
Worship and religion
Sleep – worship and religion – Sleep (6.8% on any day of the week) Sleep – wash, dress, care for self
These are the only sequences in which worship and religion appear. Their support is about 5%, and showering/dressing is the only activity found in a sequence that people do before worship.
Laundry, ironing, clothing repair
Sleep – Laundry, ironing, clothing repair – watch TV, video, DVD – Sleep (14.5% on any day of the week) Sleep – Laundry, ironing, clothing repair – meals or snacks in other places – Sleep (12.8% on any day of the week) Sleep – Laundry, ironing, clothing repair – wash, dress, care for self – Sleep (10.4% on any day of the week)
The longest sequence for individuals who do laundry, ironing, and/or clothing repair is:
Sleep – Laundry, ironing, clothing repair – shop, person/hhld care, travel – purchase goods – shop, person/hhld care, travel – watch TV, video, DVD – Sleep
So these individuals report watching TV during or after doing domestic work.
relax, think, do nothing
Sleep – meals or snacks in other places – relax, think, do nothing – Sleep (12.2% on any day of the week) Sleep – relax, think, do nothing – watch TV, video, DVD – Sleep (11% on any day of the week) Sleep – wash, dress, care for self – relax, think, do nothing – Sleep (10.3% on any day of the week)
Here we can see that watching TV, video or DVD is present for 11% of the sample in a relaxing day.
receive or visit friends
Sleep – receive or visit friends – watch TV, video, DVD – Sleep (24.1% on any day of the week) Sleep – other travel – receive or visit friends – Sleep (22.6% on any day of the week) Sleep – shop, person/hhld care travel – receive or visit friends – Sleep (22.4% on any day of the week)
With this activity, traveling and purchasing of goods are present in the highest support sequences besides washing themselves and eating.
In this work, a novel framework for time use analysis using data mining techniques is proposed. The main idea is to rank sequences of activities that are performed within a day in terms of their frequency (support). The Generalized Sequential Pattern (GSP) algorithm is used to identify the sequences of activities that are frequent. The algorithm uses a simple heuristic: it identifies activities that have a support above a predefined threshold, combining them in the next step to create sequences of size two. Those sequences that do not reach the minimum threshold are filtered out, and the ones that are frequent are then used to create sequences of size three. The process is repeated until no more frequent sequences can be found. Finally, the sequences of various lengths can be ranked to identify the most valuable ones for understanding how individuals decide to spend their days.
In order to gain insight into the behavior of the respondents, the various activities are classified into six groups: Paid Work, Unpaid Labor, Leisure, Tertiary Activities, Sleep, and Travel. Additionally, the analysis is further divided into work days and weekend activities. Several sequences are discussed for each group in terms of their frequency, associated with the corresponding references. For example, we discovered that the preferred activity performed after work is eating meals or snacks in other places with a support of 37.9% on working days, but the most frequent activity before work is, as expected, traveling to work, with 39.2% of support on work days. Furthermore, the only leisure activity detected in all sequences is watching TV, video, or DVD and the only travel activity (besides the traveling to/from work reported before) is shopping for personal or household care. Finally, taking naps is an activity most present on the weekends and mostly linked to eating and watching TV.
The aim of this study is to provide a tool for validating hypotheses and answering questions such as “which activity or set of activities are the most frequently performed by individuals before/after Activity X is undertaken?”. The proposed method allows revealing sophisticated patterns to prove precise hypotheses regarding time use. Rather than proposing our own hypothesis, we illustrate the potential of the proposed tool via exploratory analysis with several examples, relating our main findings to the existing literature in time use, and some specific categories, such as work, leisure, and sleep.
Future work will focus on further analyzing sequences to shed light on issues not covered here, such as dividing sequences by the time of day, including activity duration as a variable, and analyzing consecutive activities.
Footnotes
Acknowledgments
Jorge Rosales-Salas acknowledges Fondecyt, Chile, Grant 11180337. Sebastián Maldonado was supported by FONDECYT projects 1160738 and 1181809. The authors gratefully acknowledge financial support from CONICYT PIA/BASAL AFB180003.
