Abstract
Time diaries are a well-established method for providing population estimates of the amount of time and types of activities respondents carry out over the course of a full day. This article focuses on a computer-assisted telephone application developed to collect multiple, same-day 24-hour diaries from older couples who participated in the 2009 Panel Study of Income Dynamics (PSID). We present selected findings from developmental and field activities, highlighting methods for three diary enhancements: (1) implementation of a multiple, same-day diary design; (2) minimizing erroneous reporting of sequential activities as simultaneous; and (3) tailoring activity descriptors (or “follow-up” questions) that depend on a precoded activity value. A final section discusses limitations and implications for future time diary efforts.
Introduction
Researchers have long been interested in systematically measuring how people use time, arguably the most fundamental of human resources. Although time is a resource with internationally standardized units (i.e., hours, minutes, and seconds), individuals report time as they experience it subjectively, and thus reported time is prone to measurement error. Time diaries have been found to be one of the more reliable ways of measuring time use (Juster 1986). In time diaries, respondents are asked to recall a sequential chronology of events or episodes, typically over a recent 24-hour recall period. For each episode, they are typically asked to report what was done and for how long along with descriptors such as who was present and where the activities occurred. These questions provide population estimates of the amount of time and types of activities individuals carry out over the course of a full day.
The methodology by which 24-hour diaries have been collected has evolved over time. Early diaries were collected by interviewers using a paper-and-pencil methodology, an approach still used today. The conventional approach is to provide columns to write in the activity and descriptors that pertain to all activities, with the rows representing time in brief (e.g., 15- or 30-minute) segments. Some paper-and-pencil diaries provide an additional column in which information about what else the respondent was doing at that time can be recorded (i.e., to capture multitasking; Robinson 1985). However, paper-and-pencil diaries do not easily accommodate skip patterns and thus do not readily lend themselves to tailoring follow-up questions to particular types of activities.
Although paper-and-pencil collection is still widely used, diaries that are collected through computer-assisted telephone interview (CATI) applications are becoming more common. The American Time Use Study (ATUS), for instance, uses a CATI application to collect one 24-hour yesterday diary from a single household member. The CATI methodology has both advantages and disadvantages (Phipps and Vernon 2009). A distinct advantage is that the CATI approach introduces a certain level of consistency into the data collection process. Unlike the paper diary, which may be filled out in different ways and at different times by different respondents, the CATI asks all respondents about the previous 24-hour day. The computer application is also flexible so that conversational-sounding probes (“What is the next thing you remember doing?”) and review (“So, you were eating lunch from 12:00 p.m. to 1:00 p.m., is that correct?”) may be built into the application. Follow-ups that are tailored to particular types of activities may also be incorporated into the design of the CATI, although to our knowledge, systematic development and evaluation of such a tailoring scheme has not appeared in the literature.
A potential disadvantage of the CATI approach is that it is not especially conducive to collecting multiple time lines (e.g., simultaneous or secondary activities). Time durations in ATUS, for example, are programmed to correspond to a single 24-hour time line, with as few gaps in time as possible. The ATUS application allows volunteering of simultaneous activities, which are recorded by interviewers, but such information is not systematically coded or released. More generally, evaluation of multitasking within computer-assisted applications has been limited (see, e.g., Drago 2011).
The time use literature—with some notable exceptions—also has a long history of focusing on a single member of a household (National Research Council 2000). Although many studies (including ATUS) collect only a single diary from one respondent, others have collected multiple diaries per respondent or one or more diaries from multiple members of a household. Notably, the European statistical agency recommends the collection of two time diaries from all household members, one weekday and one weekend day, and that all household members complete their diaries on the same day. The agency also recommends random assignment of diary days and only limited postponement of days, preferably no more than 14 days later. To date, developing flexible procedures to obtain a sample of same-day diaries from couples using a CATI methodology has not been a focus of the time use literature.
This article introduces a CATI methodology for collecting from couples’ multiple, same-day 24-hour diaries with tailored follow-up items. We first describe the study’s major developmental phases, including lab-based and pretest activities, carried out at the University of Michigan’s Survey Research Center. We then present findings related to three features of the CATI time diary application: (1) implementation and evaluation of the multiple diary, same-day design; (2) attention to distinctions between sequential and simultaneous activities; and (3) tailoring activity descriptors (or follow-up questions). A final section summarizes lessons learned for future studies interested in the collection of time use data from multiple family members by telephone.
Background
The Disability and Use of Time (DUST) study was administered as a supplement to a subset of older couples in the 2009 Panel Study of Income Dynamics (PSID), the world’s longest running national panel study. Here we describe the basic PSID design, developmental efforts to create DUST, and the final DUST design.
The PSID Design
Begun in 1968, the PSID is a longitudinal study of a national sample of U.S. individuals and the families in which they reside. The PSID sample was originally drawn from two independent samples: a nationally representative sample of roughly 3,000 households and an oversample of roughly 2,000 low-income families. From 1968 to 1997, the PSID interviewed and reinterviewed individuals in these families every year, whether or not they were living in the same dwelling or with the same people. Since 1997, interviews have been conducted biennially. Because children who have left their parents’ household, which the PSID calls “split offs,” have been followed, and a sample of immigrants was added in 1999, with weights the design produces a nationally representative sample of families (exclusive of post-1999 immigrants) each year. Reinterview rates consistently have been approximately 98% per year (96% over 2 years), and the sample of families now exceeds 8,800. For additional details on the PSID, see Institute for Social Research (2011).
The core PSID is collected with a CATI application. For almost all families, all questions are answered by one respondent, about evenly split between men and women in recent years. The entire family listing and interview takes on average 75 minutes to complete. The PSID has collected information on employment, income, housing, transfer income, demographics, and family composition since 1968. Additionally, the PSID has been collecting information on health, wealth, expenditures, and philanthropy for many years.
DUST Development Process
A key motivation for this study was to enhance researchers’ ability to study time use among couples in which one or both spouses had experienced a decline in functioning. Hence, we were especially interested in being able to obtain diaries on the same days from both spouses, to allow investigations of synchronized experiences and to provide as complete a portrait as possible of flows of assistance between spouses (Kingston and Nock 1987; Michelson 2005; Sullivan 1996). The latter goal also led us to adopt a design that would capture both main and secondary activities, and, for household- and care-related activities, tailored follow-up questions about for whom the activity was carried out.
The DUST developmental process took place over 2 years (2007–2009) in three phases: focus groups and cognitive testing, assessment of interviewer interrater reliability, and two pretests. Focus groups with 19 couples of age 50 and older were facilitated by a trained moderator during the summer of 2007. Participants were asked to fill out portions of a simple paper-and-pencil diary that described the previous morning. Focus group analyses were based on notes from the videotapes and a file created from diary entries.
Cognitive testing took place during the fall of 2007, with 14 couples of age 50 and older following guidelines provided in Alcser and Conrad (2007). Couples were randomly assigned start times (8:00 a.m., 12:00 p.m., 4:00 p.m.) and asked about four consecutive activities. Breaking from the conventional practice of using a combination of activity codes and verbatim captures, all DUST cognitive testing responses were recorded verbatim (on a paper and pencil instrument), then interviewers were directed to indicate a category that best matched each activity. The sequence of follow-up questions (e.g., related to where they were, who the activity was for, who participated with them, who else was there, and how they felt) was then determined according to the category selection.
Following cognitive testing, the categories from which interviewers could select (which we refer to as precodes) were simplified to nine choices. Interrater reliability of the nine precodes was assessed during a set of mock interviews held in February/March 2008. Four interviewers were instructed to complete four “full-day” diaries (consisting of 22 activities per interview), with actors playing the part of respondents. The actors read from scripted descriptions of daily activities so that different interviewers should have produced identical time diaries. Activities were purposefully selected from actual responses provided during the cognitive testing phase with an aim of representing all codes. Kappas were calculated in STATA for pairs of interviewers and then averaged. After analyzing discrepant codes and revising training materials, a second session was held with four additional interviewers and kappas were recalculated.
Finally, two pretests were fielded, the first during the fall of 2008 with 28 couples who had participated in the developmental phases and the second during spring 2009 with seven couples who participated in the PSID core pretest. The purpose of pretest I was to conduct one set of same-day interviews with couples using the CATI diary application and paper coversheets. Pretest II was carried out to test a CATI cover-screen application developed to govern the scheduling process and to gain some experience with the time needed to schedule and complete two sets of same-day interviews.
DUST Sample and Diary Overview
DUST targeted couples in the 2009 PSID in which both spouses were at least 50 years of age as of December 31, 2008, and at least one spouse was 60 years or older at that time. Because the vast majority of married men and women 60 years and older have spouses who are 50 years and older, the sample essentially represents married people of age 60 and older and their spouses. To enhance opportunities for studying disability and care, couples in which one or both spouses have a limitation were oversampled, and strata further divided by the husband’s age (<70, 70+). Overall, 830 PSID couples interviewed during 2009 were identified as eligible for DUST. Of the 543 eligible couples who were sampled, at least one diary was completed with 394 couples (73% response rate).
The DUST CATI diary, which was closely modeled after the ATUS diary, asked about all the activities occurring on the previous day, beginning at 4 a.m. and continuing until 4 a.m. the morning of the interview. The diaries took approximately 30–40 minutes to complete on average. Up to two diaries were attempted per spouse (up to four per couple), one capturing a weekday and the other a weekend day. Respondents with a spouse who could not participate because of a permanent condition (e.g., memory loss, hearing loss) were allowed to take part in DUST (N = 33), but in those cases only two diaries, rather than four were collected. For all respondents, the first diary was paired with a 15- to 20-minute supplemental CATI questionnaire (including items on global well-being, functioning, accommodations, self-assessed memory, marital quality, secondary caregiving, and stylized time use questions). The final number of diaries completed was 1,506. Further details are available in Freedman and Cornman (2012).
Evaluation of CATI Diary Enhancements
Here we review our experience with and provide evaluative evidence for three key features of the DUST CATI diary application: (1) implementation of multiple diary, same-day design; (2) minimizing erroneous reporting of sequential activities as simultaneous without discouraging reports of multitasking; and (3) tailoring activity descriptors (or follow-up questions) that depend on a precoded activity value.
Implementation of Same-Day Design
We began with the goal of obtaining two diaries per person (four per couple), with husbands’ and wives’ interviews coordinated to take place the same day. Our approach was designed to balance the need to be flexible with respondents in setting appointments with the scientific needs of having a distribution of diaries across days of the week. To account for the especially marked differences between weekday and weekend schedules, we opted for collecting one diary of each type. To implement this design, we created a list of all possible combinations of weekday and weekend days. To ensure there would be no systematic differences between the first and second diaries, we also allowed the order to vary. To address concerns about nonresponse and offer respondents some flexibility for each combination, we added an alternative pair of “backup” days to the two “primary” days. Interviewers were instructed to use backup days only in cases when the primary day was never a possibility (e.g., never available on Sunday). Eighty different combinations of primary and backup days were listed altogether. Using systematic selection, each sampled couple was assigned a number from 1 to 80, which was preloaded into the scheduling application.
The preloaded number represented the days of the week on which the interviews were to be scheduled, and the diary day was always the day preceding the interview, ensuring a 1-day recall period for all. That is, the couple was called and after being consented to, the interviewer attempted to schedule both husband and wife for interviews on the first day in the set (e.g., Wednesday) but was flexible in terms of meeting the couple’s needs for a specific date (e.g., June 24, July 1, July 8) and time of day. The second interview was then scheduled also using the flexible scheduling protocol. Before the interview dates and times were finalized, interviewers confirmed availability of both spouses. Confirmation was accomplished by speaking directly with the other spouse or by calling back to confirm after the respondent handling the scheduling checked with his or her spouse.
We examined the distribution of time between first contact and diary completion and found 83% of first diaries and 74% of second diaries were completed within 4 weeks of first contact. 1 Had we used the 2-week standard, only 67% of first diaries and 42% of second diaries would have been completed.
As shown in Table 1, this approach results in a balanced assignment of days of the week. Sunday and Monday, which yield weekend day diaries, represent half the sample, whereas each day that yields a weekday diary is assigned 10% of the time as primary and as backup. That is, refusals to participate in DUST do not appear to disrupt the distribution of assigned interview days. Focusing on the distribution of actual interview days, we see a slight imbalance between Sundays (22%) and Mondays (28%), although together they still represent exactly half the sample as designed, and the remaining interview days are between 9% and 11%. (Note that diary weights have been constructed that take into account the differential probabilities of selection of weekend days and weekdays; when weighted, each day of the week accounts for 14.3% of diaries.)
Distribution of Assigned and Actual Interview Days and Whether Primary or Backup Random Day Assigned by Day of the Week.
Interviewers were instructed to always schedule husbands’ and wives’ interviews on the same date. Only in rare circumstances (e.g., if one spouse cancelled after the other had already been interviewed or at the end of the fieldwork period) were spouses interviewed on different dates, and in these cases interviewers were instructed to reschedule for the same day of the week. This approach also resulted in the vast majority of diaries—93%—being administered on the same date to both spouses (see Table 2). 2
Percentage of Diaries Administered on the Same Date.
Sequential Activities versus Multitasking
Many time use researchers have called for more attention to be paid to the so-called secondary activities, those that occur at the same time as another main or primary activity. Secondary activities may be truly simultaneous, such as talking on the phone while making dinner (multitasking), or may involve very short sequential activities such as making dinner, taking a break to talk on the phone, and then resuming making dinner.
Moreover, early in cognitive testing of the DUST diary application, we encountered a respondent behavior involving reporting of a string of sequential activities as if they were simultaneous. When asked “Yesterday, at [start time] what did you do next?” several participants responded by describing what appeared to be a list of sequential activities. The interviewer in two cases dutifully wrote them down and another went off script because she knew the activities were sequential and not actually all taking place at the appointed time.
Because multitasking is a legitimate behavior of interest when studying time use, we did not want to discourage reporting of it. Yet, we also clearly wanted to discourage reporting of sequential activities as if they in fact occurred simultaneously. A related and known problem we hoped to discourage was unintentional omission of travel, that is, one “leg” of a set of sequential activities.
The team devised a strategy to minimize both issues in the DUST CATI diary. Respondents were first read an introduction, using the ATUS introduction as a starting point, but further developed based on our experience with cognitive testing:
Next, we’d like to find out how you spent your day yesterday, REFERENCE DAY.
I’m going to ask you what you were doing starting at 4:00 am. Then I’ll ask a few more questions about the activity, like:
how long it took,
where you were,
who was doing the activity with you, and
who else was there.
We’ll repeat this series of questions until we reach the end of the day. If you were traveling, we’ll treat that as a separate activity. So, for instance, driving to the doctor would be separate from being at a doctor’s appointment, and then driving home would also be a separate activity. If you were doing more than one activity for the time I ask you about, that’s fine. You can tell me more than one activity for a given time.
Since multitasking is also related to the degree of activity detail, respondents were also given guidance on the granularity of their responses as follows:
Sometimes people want to know how much detail we are looking for. If you tell me you worked from 9 to 5, I may ask you to break that down for me, for example, into having meetings from 9 to 11, answering e-mails for an hour until 12, having lunch until 1, and so on. Or, if you tell me you cleaned the house all morning, I may ask for more detail, for example, you straightened up from 9 to 9:30, folded laundry for half an hour, made the beds at 10:00, and so on. On the other hand, you don’t need to tell me about changing the TV channel or walking from room to room in your house. So, somewhere in between. And if an activity is too personal, there’s no need to mention it. Ok? Let’s begin.
Respondents were then asked, “Yesterday at [4:00 AM], what were you doing?” If multiple activities were reported for a given time, interviewers were instructed to enter activities on (up to five) separate lines and were then brought to a screen that said, “Just to be clear, were you doing [both / all] of these activities at [START TIME]?” If the respondent said yes, he or she was asked, “If you had to choose, which of these would you say was the main activity?” The interviewer was also given the option of reading the definition: “By main activity, we mean the one that you were focused on most.” If the respondent said no, that he or she was not doing both activities at [START TIME], the interviewer reasked, “Yesterday, at [START TIME], what were you doing?” and corrected lines as necessary. One disadvantage of this approach is that it takes additional time to take the respondent through this series of questions and correct it. We therefore also trained interviewers to use the probe “let’s break that down” if the respondent was clearly reporting sequential activities, as evidenced by use of words like “then” (e.g., I made breakfast and then I sat down and ate it.).
Evaluation of this issue with the pilot study data took two forms: counts of corrections to secondary and/or sequential activities that were carried out during the editing stages (according to a set of rules) and analysis of the extent of multitasking and main activities in the final data. As shown in Table 3, a relatively small percentage of activities (<5%) needed to be corrected during the editing stage for reasons related to secondary and/or sequential activities. For instance, secondary activities were eliminated in approximately 0.5% of cases because the main and secondary activities were redundant and not really separate activities—for instance, putting on one’s coat and hat. In another 0.5% of cases, the secondary activity should have been listed as sequential activities (e.g., bathing and dressing). And in 0.5% of cases, two activities were listed as main on one line, when one should have been listed as secondary (e.g., watching TV and eating). With respect to sequential activities, approximately 3% of main activities were “split off” from an existing activity, either because two activities were recorded on a line, because a secondary activity should have been sequential, or because a travel episode was missing.
Number and Percentage of Activities Requiring Selected Edits, by Edit Type and Correction.
Importantly, this approach still yields a fair amount of multitasking. Out of the 36,898 main activities reported, 5.4% involved reports of multitasking (5% involved exactly one secondary activity; 0.4% had two or more at the same time as the primary). Weighting this estimate by main activity duration and to take into account differential probabilities of selection, we estimate that in older couples 7.7% of waking time (73 minutes) involves main activities that have at least one simultaneous secondary activity. By comparison, a recent ATUS working paper (Drago 2011) suggests that the ATUS methodology of allowing respondents to volunteer secondary activities captures 36 minutes a day of this behavior (across all ages) and that adults 70 years and older report more secondary activities than younger adults.
As noted earlier, too high a level of multitasking may signal a reporting problem, notably inappropriate doubling up of main, sequential activities rather than true multitasking. If DUST was experiencing this problem, we might expect to see the mean number of main activities per diary to be lower in DUST than in other studies. However, we find the opposite: For married individuals 60 years and older, the mean number of main activities per diary was 20 in ATUS and 26 in DUST.
Tailored Follow-Ups
After recording main and secondary activities in open text fields and recording/confirming the duration of the main activity, interviewers were instructed to select one of nine categories that best matched the main activity (see Table 4 for definitions). Such precodes were not designed to replace detailed activity codes, since DUST postcodes all main and secondary activities with detailed (3-digit) activity codes from open text fields. Instead, the DUST precodes were used to route the respondent to appropriate follow-up questions.
Distribution of Original and Final Precode Values and Resulting Missing Information.
For most activities (precode 9), follow-up questions include: “Where were you while you were doing that?” “Who did that with you?” [If not in a public place] “(Besides [person already mentioned],w/W)ho else was [location] with you?” and “How did you feel while you (were) ___?” To capture flows of support, for household chores/helping and caring activities (precodes 7 and 8), we added an additional question modeled after one used in the New Zealand Time Use Study: “Who did you do that for?” For “who” questions a tailored list of responses was allowed, reflecting the respondent’s household composition (which can be linked through an ID in the Household File to person level core PSID information) as well as generic relationship categories for people outside the household.
Note that this series of follow-up questions distinguishes three types of interactions with other people: doing an activity for another person (who for), with another person (who active), and having another person there but not engaged in the activity (who passive). Table 5, which includes the subset of activities with precodes 7 (household care) and 8 (care for another person), suggests these concepts are indeed distinct. For example, over 75% of these activities are performed alone, but only 18% of activities are carried out alone and when no one else is there with the respondent. Most of such activities are reported as being done for only the respondent (41%), for all household members (35%), or for the spouse (27%).
Distribution of Responses to Tailored Follow-up Questions for Household Activities and Care for Others: Comparison of Who Active, Who Passive, and Who For.
Note: 18.4% (76.7 × 24.0) of activities involved no one actively engaged and no one else there.
For other precode categories such as work (5) and grooming (2), there were fewer follow-ups, and for travel (3 and 4) and talking to others (6), follow-ups were tailored to the activity (how did you get there? who did you pick up/drop off? Or who were you talking with? Was that on the phone or in person?). A report of sleeping (1) as the first or last activity of the day received follow-up questions related to the quality of that night’s sleep.
Reliable precoding of the main activities is critical to the success of this approach. In round 1 of the mock interviews, there was perfect agreement across all four interviewers in 71 (81%) out of 88 cases coded. Average kappas were .905, which suggest a high level of reliability (with about 40 minutes of training). We analyzed patterns of discrepant cases, revised the training, and in round two of mock interviews found perfect agreement across all four interviewers in 74 (84%) out of 88 cases. Kappa values were again quite high, approaching or exceeding .90. For pretests I and II, the lead author coded all activities and compared responses to the interviewers’ codes. A 91% agreement was achieved in pretest I. Training was further enhanced based on an analysis of discrepant code patterns; a 97% agreement was achieved in pretest II.
As a final step, in the postproduction editing process, we evaluated the precodes by comparing them to the more detailed postcodes that are assigned based on open text fields. For example, activities that are given the detailed code for watching TV should always be precoded 9. We found that following the pilot study, 9.8% of the 36,898 activities reported in the study needed to have their precoded value changed to be consistent with more detailed 3-digit postcodes (Table 4). However, the majority of precode changes did not result in missing data. Overall, only 2.4% of activities had missing data as the result of an incorrect precode being initially assigned. So, for example, even though the code for work had to be assigned during postproduction to 22% of work activities, this correction resulted in missing follow-up information for only 4% of work activities.
Discussion
This article has reviewed the development and implementation of a new multiple, same-day diary CATI interview. Although CATI time diaries are not new, the DUST design offers several features that, upon evaluation, appear ripe for adoption by other studies: procedures to obtain multiple, same-day diaries from multiple family members; an approach to minimizing erroneous reporting of sequential activities as simultaneous without discouraging reports of multitasking; and use of tailored activity descriptors that depend on a precoded activity value determined by the interviewer.
Our experience developing and fielding the PSID supplement on DUST suggests that same-day diaries can be successfully collected from couples in a way that yields a random distribution of diary days if strategies are put into place that permit flexibility in scheduling. We also demonstrated that it is possible to minimize erroneous reporting of sequential activities as simultaneous without discouraging the reporting of true multitasking. Finally, our method of precoding each activity to tailor follow-up descriptors succeeded in that it yielded very low levels of missing data.
There are several limitations to this analysis worth noting. It is possible that our experience with older adults might not be replicated in a general population survey. Whether diaries are easier or more difficult to collect from this population is not clear. It may be that older adults have fewer time constraints on average than other age groups and are thus easier to interview; alternatively, it might be harder to collect such information if reconstructing the day is a more challenging cognitive task for older adults. We were not able to investigate age differences with the data at hand. Nor could we investigate whether allowing flexibility in scheduling (to the same day in future weeks) biased diaries to more or less busy weeks. However, it does not appear to result in fewer activities when compared to a comparable sample from ATUS, and most diaries were completed within 4 weeks of initial contact.
Despite these limitations, the particular features described in this article make it possible to study in rich detail both time use and flows of assistance within older couples. Because DUST is linked to a much longer, ongoing panel study, researchers will also be able to explore how time use varies, as a function of longer run health, disability, and economic status of the family. In addition, linkages to future PSID waves will allow analysis of the implications of time use for a variety of later-life health outcomes.
The initial evaluations conducted here also suggest that such features may hold promise for many other substantive topics whose analyses rely on CATI-based time diary collections. Ensuring that CATI diaries are administered to family members on the same day is of interest, for instance, for studying intrafamily allocation of time, including synchronization and substitution of market and nonmarket time and the sequencing of family events over the day. Introducing careful distinctions between simultaneous and sequential activities is especially relevant for studying child care, social interactions, eating, use of media, and other activities commonly carried out as secondary activities. Finally, by allowing more in-depth follow-up questions, minor adaptations to the tailored precode approach developed here could be used to explore numerous other time-use–related topics, such as sleep quality, children’s interactions with media, the nature of physical activity, or flows of assistance to and from family members, to name a few. Should other CATI-based diary studies adapt the strategies presented here to fit their specific research goals, the findings in this article may be a useful point of comparison for future evaluative efforts.
Footnotes
Acknowledgments
An earlier version of this article was presented at the American Time Use Research Conference, June 25–26, 2009, at the University of Maryland, Adelphi, MD. This research was funded by the National Institute on Aging, P01 AG029409-04. The views expressed are those of the authors alone and do not represent their employers or funding agency.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the National Institute on Aging, P01 AG029409-04.
