Abstract
The Autobiographical Memory Test (AMT) is the most commonly used tool to assess the phenomenon of overgeneral memory. The AMT has mainly been used in adult populations, but its use in preschool children is less common. The need to create an appropriate instrument to study the memory specificity in preschool years led us to develop an AMT version adapted for early childhood. The AMT–Preschool (AMT-P) was administered to a sample of preschool children aged between 3 and 6 (N = 364). The results suggest that the AMT-P functions differently in preschoolers depending on age. With children older than 53 months, results suggest that the AMT-P is appropriate for assessing overgenerality. Nevertheless, with younger children age, the task is more difficult. These results concur with previous research suggesting that the ability to recall specific memories is consolidated from the age of 4½.
Keywords
Autobiographical memory (AM) refers to declarative memory for the retrieval of significant personally experienced events (Bauer, Larkina, & Deocampo, 2011; Conway & Pleydell-Pearce, 2000; Williams et al., 2007). AM emerges gradually during the preschool years through processes of social interaction and cognitive development and continues to develop throughout childhood and adolescence (Fivush, 2011; Nelson & Fivush, 2004). Although the origin and development of AM differ according to each individual, it is commonly agreed that the factors critical to its development include linguistic, social, cultural, cognitive, and emotional aspects (Howe, 2014).
At a theoretical and empirical level, autobiographical memories are thought to serve three basic functions: directive, self, and social (Bluck, Alea, Habermas, & Rubin, 2005). Importantly, all three functions are transcultural (Alea & Wang, 2015) and are related to individual well-being (Waters, 2013).
AM takes a hierarchical form dependent on its level of specificity (Conway & Pleydell-Pearce, 2000). Specific autobiographical memories are memories associated with single unique events that took place in a specific place and time and lasted for a day or less (e.g., my wedding day). Memories referring to a whole class of events generally stored in categories such as persons, places, and activities are called categorical memories (e.g., every argument with my wife), and memories that refer to an extended period in time are called extended memories (e.g., when I was at school). A number of authors suggest that the function a memory comes to serve may differ according to the type of event recalled (specific, categoric, or extended events; Waters, Bauer, & Fivush, 2014). Consequently, specific events score high on self- and directive functions compared with recurring events, whereas categoric events score high on social function. Finally, extended events are found to score high on all functions compared across event types.
Difficulty in retrieving specific memories and recalling categoric or extended memories, instead, is known as overgeneral memory (OGM; Williams et al., 2007). OGM has become one of the most studied aspects of AM. Research into this phenomenon has increased due to evidence of its relationship to certain forms of psychopathology, such as posttraumatic stress disorder or depressive disorders (Moore & Zoellner, 2007; Williams et al., 2007). Most of this research has been conducted with adult populations (Moore & Zoellner, 2007; Sumner, 2012). However, very little research has been conducted with normative samples to study the capacity for retrieving specific memories in preschoolers (Nuttall, Valentino, Comas, McNeill, & Stey, 2014).
Different procedures exist for assessing AM in children, such as those based on standardized open-ended memory questions about both the recent and the more distant past concerning autobiographical events common in young children’s lives (e.g., Han, Leichtman, & Wang, 1998; Wang, 2004, 2008). Some tasks even include an additional future component (Wang, Capous, Koh, & Hou, 2014; Wang, Hou, Tang, & Wiprovnick, 2011). However, the most commonly used tool for assessing the difficulty in retrieving specific memories is the Autobiographical Memory Test (AMT; Williams & Broadbent, 1986). The AMT uses cuing methodology. Participants are presented with cue words and are asked to produce a unique, specific memory that the cue words remind them of within a given time limit (e.g., 1 minute). Participants are also asked to generate a different memory for each cue word. The cue words usually differ in valence, with most studies including both positive and negative words (Griffith, Sumner, et al., 2012). Before beginning the AMT trials, participants are provided with the definition of a specific memory and complete practice trials to ensure they understand the instructions. Responses on the AMT are coded into one of the following categories: specific memories, extended memories, categoric memories, semantic associates, or nonmemories (see Measures section for an explanation of each category). The AMT has been used successfully with children older than 7 years and in adults (Valentino, Bridgett, Hayden, & Nuttall, 2012; Valentino, Toth, & Cicchetti, 2009). In children aged younger than 7 years, AM has basically been assessed by analyzing the level of detail in memories retrieved by children using the following procedure: The parents or examiners select a memory regarding a specific event experienced by the child. This memory is then discussed and analyzed to assess the level of detail in the information received (Fivush, Haden, & Reese, 2006; Reese & Newcombe, 2007). Nevertheless, there are very few studies using the AMT with preschool-age samples (Nuttall et al., 2014). Consequently, there is a need to design tasks to assess OGM in order to assess AM specificity in early childhood. Valentino (2011) recommends a developmental psychopathology approach to the study of AM specificity. This approach is relevant to the understanding of developmental processes and outcomes at different levels of complexity in both typical and nontypical child development, including biological, cognitive, affective, social, and cultural aspects (Cicchetti, 1984).
As suggested by various authors (Hitchcock, Nixon, & Weber, 2014; Valentino, 2011), children’s processes and disorders should not be regarded as simplified versions of those of adults. It is important to analyze the origin, development, and distinctive characteristics of OGM in early ages. Consequently, research should focus on designing a task for assessing the early capacity to retrieve specific memories and so establish its baseline development. To this end, a version of the AMT that is appropriate for use with preschool children is needed.
Griffith, Kleim, Sumner, and Ehlers (2012) address the importance of methodological issues in the AMT. These authors believe that the methodology used in the different versions of the AMT can affect the interpretation of the results. They specifically highlight the importance of assessing its psychometric properties. Recent studies have focused on this issue. Griffith et al. (2009), Griffith, Kleim, et al. (2012), and Heron et al. (2012) point to the existence of a one-factor structure in the specificity of AM. This factor includes the memories for both positive and negative cue words. Raes, Williams, and Hermans (2009) assessed the test–retest reliability of the AMT in college undergraduates and depressed inpatients. The intervals between baseline and follow-up ranged from 1 to 5 months. Their results found correlations ranging from .53 to .68. Griffith, Kleim, et al. (2012) examined the internal consistency of the AMT and found a reliability coefficient of .72 (95% CI [.67, .77]).
To our knowledge, the study by Nuttall et al. (2014) is the first to use a version of the AMT (Autobiographical Memory Test–Preschool Version [AMT-PV]) with children aged between 4 and 6 years. The AMT-PV consisted of 10 cue words (happy, mad, surprised, sad, lucky, scared, strong, tired, smart, and hungry). The words were presented alternating between positive and negative valences. With regard to the factor structure of the AMT-PV, their results coincide with those of previous studies (Heron et al., 2012), which suggest the existence of a one-factor structure of AM specificity. The psychometric assessment of the AMT-PV, using item response theory (IRT; Lord & Novick, 1968), concludes that the task is a reliable measure of AM specificity in preschoolers.
This article aims to provide additional evidence to the research conducted by Nuttall et al. (2014) on the validation process of the AMT in preschool populations. In this regard, there are some important differences between our work and that of Nuttall et al. First, Fivush and Nelson (2004) suggested that the basic ability to retrieve and report specific memories begins to develop at the age of 3 but does not stabilize until the age of about 4½ (Bruce, Dolan, & Phillips-Grant, 2000). For this reason, our sample consisted of preschool children aged between 3 and 6 years old. Second, Nuttall et al. (2014) included developmentally appropriate cue words for preschoolers but did not indicate how these cue words were selected. As suggested by Griffith, Sumner, et al. (2012), there is no generally accepted standardized set of stimuli for the AMT and different cue words can lead to different results. Given that our study was conducted in a different country and some participants were younger than Nuttall et al.’s, we thought it necessary to develop an AMT version (AMT-P) adapted to the age of our sample and to the sociocultural characteristics of preschool education in Spain (see Measures section for an explanation of the words selection procedure). We consider it important to develop AMT versions adapted to preschool populations in different countries. Several studies have shown that the characteristics of the educational settings in different countries influence children’s cognitive and language development (e.g., Cryer, Tietze, Burchinal, Leal, & Palacios, 1999). For example, Montie, Xiang, and Schweinhart (2006) found that increased adult–child interaction in preschool is related to better language scores in countries with less group response and adult-centered teaching, and poorer language scores in countries with more group response and adult-centered teaching. Given that language is one of the critical factors for the emergence and development of AM (Fivush & Nelson, 2004), it could be expected that differing educational styles influence AMT performance in preschool samples. For this reason, we have developed an adapted version of the AMT (AMT-P) for preschool Spanish populations. Third, Nuttall et al. (2014) presented the cue words orally and visually. Their work provided no other information about how the visual presentation was conducted (e.g., Did the children see written words or pictograms?). In our work, to facilitate children’s understanding of the cue words, they were presented orally, accompanied by pictograms (see Measures section for an explanation of the pictograms selection procedure). Finally, in Nuttall et al.’s study, AMs were coded as specific or general. In our study, the memories were coded according to five different categories (specific, categoric, extended, semantic associate, and nonmemory) following the criteria described by Williams (1992). This coding enables us to assess how each memory type changes with age.
In the development of our AMT version, we have two main aims. First, following the approach of previous studies, we expect to find that a one-factor model produces the best fit to the data. Second, to assess whether the selected cue words are appropriate for Spanish preschool populations, we use IRT to analyze the discrimination and difficulty parameters of the AMT-P cue words and the general level of the information provided by the test.
Method
Participants
Preschool children from seven state schools and two state-funded schools located in urban areas took part in this study. Participants were recruited by the school counsellors from the schools involved in the study. A total of 98% of the pupils in the different schools took part. The remaining 2% did not participate due to lack of parental consent. The final sample comprised 364 participants (47% boys; ages 39-78 months; M = 55.44, SD = 10.55) from the 3 years of Spanish Preschool Education: Age-Group 1 (N = 158; 45.6% boys; ages 39-52 months; M = 45.37, SD = 3.41), Age-Group 2 (N = 109; 45.9% boys; ages 53-63 months; M = 57.28, SD = 3.19), and Age-Group 3 (N = 97; 50.5% boys; ages 64-78 months; M = 69.77, SD = 3.51). The children were all from families of medium-/high-socioeconomic status with an annual income ranging from €25,000 to €56,000.
As previously mentioned, certain mental disorders such as depression or posttraumatic stress disorder can affect specific memory retrieval, leading to OGM. Furthermore, a number of studies show that communication disorders and generalized developmental disorders are the most prevalent problems (11.11% in each case) in the Spanish preschool population (e.g., Navarro-Pardo, Meléndez, Sales, & Sancerni, 2012). Given that these mental disorders can have an impact on performance in the AMT, due to either an OGM effect or their effect on the acquisition of communication and language skills, a prior diagnosis of mental disorder is established as an exclusion criterion. Information on this was provided by the school educational guidance teams, who are responsible for writing pupils’ psychological reports. These teams are coordinated by the corresponding Children’s Mental Health Units and so have access to clinical information on pupils. No child was excluded from the study due to this criterion.
Measures
Autobiographical Memory Test–Preschoolers
The AMT-P adapts the procedure of the task designed by Williams and Broadbent (1986) to use with preschoolers. The AMT-P consists of 10 words. Five are positively valenced (happy, loving, be friends, share, and play), and five are negatively valenced (sad, angry, take away, argue, and hit). It is important to note that in Spanish, the 10 items are one-word stimuli (feliz, triste, cariñoso, enfadado, compartir, quitar, jugar, discutir, amigos, and pegar), but when translated into English, some words underwent changes in morphology and compound words were used. For example, take away corresponds to the Spanish one-word item quitar.
Selection of Cue Words
The aim was to adapt the original AMT designed by Williams and Broadbent (1986), selecting words that would be comprehensible for preschool-age children. The words used in the original AMT reflected emotions, so we attempted to select words reflecting basic emotions that preschoolers would have acquired already. A number of studies show that 3-year-olds can identify facial expressions such as happy, angry, and sad (Denham, 1986; Widen & Russell, 2003) and are more adept at identifying emotions that occur in their known environments (Fabes, Eisenberg, Nyman, & Michealieu, 1991). Consequently, with the collaboration of teachers and school counsellors, we selected emotions that commonly appear in familiar settings such as the school and the home. We selected pairs of opposing emotions since, as suggested by Denham and Couchoud (1990), preschoolers are more accurate at identifying the correct emotion if the two emotions potentially evoked by the situation are of a different rather than the same valence (e.g., happy and sad vs. sad and angry). After selecting the cue words, a pilot study was conducted with 15 children (5 children from each year group) to check whether they were able to understand the procedure and retrieve memories associated with each word. This pilot study was also used to validate the practice words (bike and story). The results of the pilot study were considered satisfactory since the examiners found that (a) instructions were understood by 100% of the preschoolers participating in the study and (b) specific memories were recalled for between 30% and 100% of the words, including the practice ones.
Selection of Pictograms
Each word was associated with a pictogram. The pictograms for the words happy, sad, angry, play, and argue were taken from the Spanish version of the system designed by the International Society of Augmentative and Alternative Communication (Warrick, 1998). If a word did not clearly match any pictogram from there, the school counsellors selected various pictures taken from teaching material designed by the schools for use with preschoolers. A vote was then taken on which picture best represented the word. The pictograms were evaluated by five school counsellors and five preschool teachers. This selection panel voted positively if they considered they would choose a particular pictogram to present a word to their own pupils. Figure 1 shows examples of the pictograms used.

Examples of pictograms from the Autobiographical Memory Test–Preschoolers.
Coding the Autobiographical Memories
The memories were audio-recorded and subsequently transcribed and coded following the criteria described by Williams (1992). Memories were coded according to level of specificity. Events associated with a specific place and moment lasting less than a day (e.g., “when I met my cousin at the theatre”) were classified as specific memories. Events or situations that were repeated during a certain period of time (e.g., “when I went to swimming classes in the afternoons”) were classified as categoric memories. Extended memories referred to events that took place over a prolonged period of time (e.g., “when I was on holiday at the beach”). Responses consisting of the names of people, animals, or things were classified as semantic associations (e.g., “mummy”). If the cue word generated no response, the response did not correspond to the cue, or the response did not refer to an AM (e.g., “I want to have a dog”), the cue was classified as a nonmemory. To avoid memories being invoked by the examiner, only the child’s first response to a cue word was considered.
The memories were coded by two separate examiners blinded to the objectives of the study and who had not participated in the data collection. Interexaminer reliability for memories was 93.3%.
Procedure
The schools participating in the study invited the preschoolers’ parents or legal guardians to attend an informational meeting with parents. Informed consent was obtained from those who wanted their children to take part in the study.
Data collection was conducted by nine experimenters. Participants were individually assessed during the school day, outside the classroom but within the school building. The assessment was conducted in a single session in which the AMT-P was applied. Each cue word was presented in association with a pictogram. The words and pictograms were presented in a fixed order, alternating between positive and negative cues. The experimenter asked the children to verbally generate a memory associated with the word. The concept of specific memory had previously been illustrated by use of examples. All participants were given two practice words (bike and story) to ensure understanding and clarify any doubts. After the practice words, the task started with the presentation of the first word and its corresponding image. After each word, the children were given the following instructions: “Think of a specific moment when you felt/were/had . . . and tell me what happened.” They were given a minute to answer. At no point did the experimenter help the children retrieve specific memories.
The session lasted approximately 30 minutes. After finishing the task, the children were rewarded with a sticker or candy and then returned to class.
Data analysis
First, a confirmatory factor analysis (CFA) was conducted to test whether the score obtained on the AMT-P represented a single dimension of the specificity of AM. Second, a descriptive analysis of the data was conducted using SPSS 20.0.
Finally, the psychometric properties of the AMT-P were examined using IRT. This theory is based on the notion that an individual’s performance on a test item can be predicted by latent traits (Hambleton, Swaminathan, & Rogers, 1991). IRT and CFA can be unified in a single mathematical framework so the factor structure of the AMT and the IRT parameters of individual items can be examined simultaneously. The CFA and the IRT analyses were conducted using Mplus 6.12 software (Muthén & Muthén, 1998-2011).
Results
Descriptive Statistics
Tables 1 and 2 show the distribution of responses and the percentages for each of the AMT items for age ranges 39 to 52 months and 53 to 78 months, respectively.
Distribution of Responses for the 10 Autobiographical Memory Test Items in the 39- to 52-Month Age-Group (n = 158).
Distribution of Responses for the 10 Autobiographical Memory Test Items for the 53- to 78-Month Age-Group (n = 206).
Confirmatory Factor Analysis
In this analysis, the variance of each factor was set at 1.0 so that the loading of each AMT item could be freely estimated. Each of the items was treated as an ordinal categorical indicator, and so a weighted least squares means and variance adjusted estimation was used. The comparative fit index (CFI; Bentler, 1990; Hu & Bentler, 1999) and root mean square error of approximation (RMSEA; Browne & Cudeck, 1993) were used to evaluate the model fit. For well-fitting models, Hu and Bentler (1999) suggest cutoff points for CFI of ≥0.95 and for RMSEA of <0.06. Regarding χ2, as the significance of the χ2 statistic is influenced by the sample size, some authors suggest the use of chi-square/degrees of freedom (χ2/df) ratio as a better measure of the goodness of fit of the overall model (Byrne, 2001). Tabachnick and Fidell (2001) recommend a χ2/df ratio <2 for well-fitting models.
A one-factor CFA was conducted including both the positive and negative cue words. This model showed a poor fit to the observed data (CFI = 0.98, RMSEA = 0.07), χ2(35) = 89.84, p = .000, χ2/df = 2.57. Various authors note that although autobiographical memories exist at age 3, it is only from age 4½ that AM becomes more continuous (Bruce et al., 2000; Fivush & Nelson, 2004). This could explain why the results of the CFA are not a good fit since the sample includes children from the 39- to 52-month age-group. For this reason, the CFAs were conducted separately for the children in the 39- to 52-month age-group and those in the 53- to 78-month group. The results of the CFA carried out with the 53- to 78-month sample provided a good fit to the observed data (CFI = 0.96, RMSEA = 0.059), χ2(35) = 61.04, p = .004, χ2/df = 1.74.
However, the results of the CFA conducted with the 39- to 52-month sample did not give a good fit (CFI = 0.93, RMSEA = 0.07), χ2(35) = 59.32, p = .006, χ2/df = 1.69. A more detailed analysis of the results showed that cue word sad had a factor loading below 0.4. Furthermore, if we consider the distribution of memories in the different AMT-P cue words, “sad” shows a higher number of specific and categoric memories and a lower percentage of semantic associations than the other cue words. This difference in the distribution of memories could be due to the fact that its factor loading in the AMT-P is lower than that of the other words. For this reason, we decided to omit it. Without this item, the model had a good fit to the observed data (CFI = 0.95, RMSEA = 0.059), χ2(27) = 41.89, p = .033, χ2/df = 1.55.
Additionally, we attempted to fit a two-factor model, that is, one factor formed by memories in response to positive cue words and another formed in response to negative cue words. In both samples, the model fit could not be empirically identified due to the fact that some of the estimated variances were negative.
Item Response Theory Analyses
IRT is a model-based framework that assumes there is an underlying latent trait that is dependent on both examinees’ responses and item parameters (Lord & Novick, 1968). Consequently, rather than simply obtaining a total test score, with IRT analysis the item parameters and participant ability levels can be estimated simultaneously. The IRT parameters were calculated using Samejima’s graded response model (Baker & Kim, 2004). Each of the observed items was related to a latent trait of AM specificity using probit regression analysis. This type of analysis provides two types of parameters: item slopes (or discrimination parameters) and thresholds (or difficulty parameters). The first parameter provides a measure of the item’s ability to discriminate between low and high scorers in the latent trait AM specificity. The second parameter provides information on the level of difficulty a person has in generating a certain response to an item. In our case, there are five response categories, so our results present four thresholds.
Item slope and threshold parameters estimates are shown in Table 3 for the 53- to 78-month age-group and in Table 4 for the 39- to 52-month age-group. As regards the first group, the results show standardized item slopes of .45 or higher and standardized thresholds ranging from −1.53 to 0.13. Figure 2 shows the item response and item information functions for the cue words happy and argue (the easiest and the most difficult cue words, respectively). In general, the items present a good level of discrimination since the response functions are not excessively skewed to the left, meaning the probability of participants responding with a specific memory is not unduly high. The results of the item information functions are similar: Most of the items show the information function peak above 0.4, so they provide an adequate amount of information. The items happy and share present the lowest levels of information of all the cue words. The fact that their response functions are slightly to the left and their information function’s peaks are lower than 0.4 suggests that both items are easy. However, broadly speaking, their values are acceptable. Eliminating them would significantly impair the model, and so the words were kept in the task.
Item Response Theory Parameters for the 53- to 78-Month Age-Group.
Note. All item slopes are statistically significantly different from zero (p < .000).
These items did not generate extended memories. Consequently, the values shown in the table refer to the threshold from a categoric memory to a specific one.
Item Response Theory Parameters for the 39- to 52-Month Age-Group.
Note. All item slopes are statistically significantly different from zero (p < .000).
These items did not generate extended memories. Consequently, the values shown in the table refer to the threshold from a categoric memory to a specific one.

Item response functions and item information functions for two selected items from the one-dimensional response model for children from 53 to 78 months.
As regards the 39- to 52-month age-group, the results show standardized item slopes of .47 or higher and standardized thresholds ranging from −1.03 to 1.58. Figure 3 shows the item response and item information functions for the cue words happy and loving (the easiest and the most difficult cue words, respectively). In general, the items present a good level of discrimination since the response functions are not excessively skewed to the right, meaning the probability of participants responding with a specific memory is not unduly low. Since the response functions of this age-group are skewed to the right compared to 53- to 78-month age-group, this result suggests that the probability of participants responding with a specific memory is lower in preschoolers younger than 4½ years than in those older. The results of the item information functions show that only three items (loving, share, and hit) have information function’s peak above 0.4. The fact that their item response functions are slightly to the right and their information function peaks are lower than 0.4 suggests that the AMT-P could be more difficult for children under 4½ years than for older children.

Item response functions and item information functions for two selected items from the one-dimensional response model for children between 39 and 52 months.
Figure 4 shows the test information function for both samples of children, which plots test information (a way to express measurement precision) as a function of memory specificity. Test information is the sum of information conveyed by each individual test item. The figure shows the range of values for the AMT where the scale is most useful. It can be seen in the figure that the point where the information reaches its maximum level in both samples is close to zero in the latent trait of memory specificity. The standard error of measurement (SEM) corresponding to a score of 0 in the latent trait is 0.23 and 0.30 for the 53- to 78-month group and the 39- to 52-month group, respectively. Regarding the 53- to 78-month group, the range of the latent trait where the SEM is less than 0.5 varies between −2.6 and l.4. As regards the 39- to 52-month age-group, the range of the latent trait where the SEM is less than 0.5 varies between −1.7 and 2.5. Outside both ranges, the SEM increases quickly.

Test information function (solid line) and standard error of measurement (dashed line) for the Autobiographical Memory Test.
Discussion
The aim of this study is to validate the AMT-P test, which was developed for use with preschool populations. First, the factor structure of the test was examined. Second, IRT was used to analyze its psychometric properties in order to assess the usefulness of its application in samples of preschool children.
Regarding the factor structure of the AMT-P, the CFA and IRT results show that the items form a unified measure of one underlying construct in both samples. The test, therefore, measures a one-dimensional construct of AM specificity. This result is in line with previous studies conducted with adult populations (Griffith et al., 2009; Griffith, Kleim, et al., 2012), adolescent populations (Heron et al., 2012), and preschoolers (Nuttall et al., 2014). However, the CFA results show that the AMT-P task functions differently with preschoolers under and over age 4½ years. While the complete version of the AMT-P provides a good fit when used with preschoolers older than 4½ years, with preschoolers younger than this, it only has a good fit when one of the items (sad) is omitted. This item was omitted from the scale for the 39- to 52-month age-group due to its low factor loading in the test. The item “sad” probably performs badly in the AMT-P due to the different distribution of memories for this item in regard to the other cue words.
With regard to the psychometric properties of the AMT-P, the IRT analyses show a good level of discrimination and information for the 53- to 78-month age-group. Happy and share seem to be the simplest items and, consequently, their informative capacity regarding the memory specificity trait is lower. However, these items were kept in the scale because they had adequate indices of discrimination, and their presence enhances the general statistics of the model. The IRT analyses of the 39- to 52-month age-group show that the items have a good level of discrimination. In this case, the simplest cue word with the least information capacity is also happy, but as in the case of the other sample, it was decided to keep the item in the test given its high level of discrimination and since its presence enhanced the general statistics of the model.
Regarding the item thresholds in the 53- to 78-month age-group, the thresholds to achieve a specific memory are close to the mean for all the items. This means that in general, the preschoolers below the mean in the memory specificity trait are more likely to achieve a nonspecific memory (categoric, extended, etc.) than a specific one. This result seems to indicate that the difficulty of the task is appropriate to this age-group. Finally, the test is more accurate and the error of measurement is lower in the case of preschoolers who score around the mean. The accuracy decreases and the error of measurement increases the further the score is from the mean. However, the results obtained in the 39- to 52-month age-group are different. These preschoolers scored above the mean in the threshold items. This means that in general, even some preschoolers above the mean in the memory specificity trait are more likely to achieve a nonspecific memory (categoric, extended, etc.) than a specific one. This result seems to indicate that the task is more difficult for this age-group. Finally, the test is more accurate and the error of measurement is lower in the case of preschoolers who score slightly above the mean.
Broadly speaking, the results of the IRT are different for each age-group, which means that the task is appropriate for preschoolers from the age of 4½ years and that the test is more difficult and less informative for younger preschoolers, although we think it can still be administered to samples of this age-group. Regarding difficulty of the AMT-P for the 39- to 52-month age-group, this result could, to some extent, be expected, given that the basic ability to retrieve and report specific memories, despite emerging from the age of 3 years (Fivush & Nelson, 2004), does not stabilize until around the age of 4½ years (Bruce et al., 2000). Additionally, Nieto, Ros, Ricarte, and Latorre (2015) find that the capacity of preschool children to retrieve specific autobiographical memories is best explained by the executive functions. Taking into account that during preschool years there are significant advances in executive functioning abilities (Carlson, 2005; Pritchard & Woodward, 2011; see Diamond, 2013, for review) and that different theories have underlined the role of these functions in the access to specific autobiographical memories (Conway & Pleydell-Pearce, 2000; Williams et al., 2007), these cognitive abilities could have acted as modulating variables between age and specificity in the study sample. Finally, as regards the level of information provided by the AMT-P in younger preschoolers, although it is true that in general the AMT-P cue words are less informative in the 39- to 52-month group than in the 53- to 78-month group, we believe that the information parameters obtained are still adequate to recommend the use of the AMT-P in the younger group. If we observe the test information function of the AMT-P, it can be noted that the difference between both groups is relatively small. In fact, both groups show similar lower SEMs in the specificity latent trait. The lower values of the 39- to 52-month group in the range of the latent trait where the SEM is less than 0.5 are probably due to the greater difficulty of the AMT-P in this age-group.
Broadly speaking, our results coincide with those of Nuttall et al. (2014). The test follows a one-dimensional model of specificity and the different parameters of discrimination and information are adequate. These positive results are maintained despite the differences between both studies regarding tasks and procedures. The use of pictograms in our test is an example. However, the greatest difference between the two tests is probably the use of different cue words: of the 10 words used, only “happy” and “sad” were included in both. Several authors consider the set of cue words used in the test to be one of the main limitations of the AMT. The results are associated with the cue words selected, and so the use of different words could lead to different results ( or Griffith, Sumner, et al., 2012). In this regard, it might be expected that the language used in the task could affect the results. As mentioned previously, in the specific case of preschool children the educational characteristics of the population should be taken into account since there are commonly differences in the educational level between different countries (e.g., Cryer et al., 1999; Montie et al., 2006). Consequently, we believe it important to develop validated versions of the AMT in different languages and adapted to the educational characteristics of the preschool population. In this respect, our results show that the AMT-P is a valid task to be administered in Spanish preschool populations.
Our study follows no specific procedure to verify the veracity and accuracy of the children’s memories. However, this is common to studies using the AMT as a tool to assess memory specificity and OGM. In fact, to our knowledge, there is no study that checks the veracity of the memories generated using any of the various versions of the AMT, in either adult or child populations. Similarly, the study by Nuttall et al. (2014) reports no method for controlling memory veracity. This is since the main aim of these studies is to assess a cognitive style of AM retrieval (OGM) and not the veracity and accuracy of the memories generated. Indeed, recent AM theories hold that our autobiographical memories are reconstructed each time they are retrieved. This is known as memory reconsolidation (Alberini & LeDoux, 2013). As memories are retrieved over time, they may be modified, resulting in memories that, to a greater or lesser extent, may differ from the actual event experienced. Furthermore, a number of studies confirm the ability of small children to retrieve accurate autobiographical memories. A number of studies actually show that adults and older children generate more false memories than preschoolers (for a review, see, Brainerd, Reyna, & Ceci, 2008). From a forensic psychology perspective, Davies and Pezdek (2010) state that children’s autobiographical memories are remarkably accurate. Finally, recent research with preschoolers by Alonso, Ros, Ricarte, and Latorre (2015) shows that the ability to remember specific events in the AMT-P correlates strongly with the veracity of children’s memory of a specific event.
This study has some limitations. The first is related to the characteristics of the AMT itself. Some authors have questioned whether the AMT truly assesses AM specificity or simply reflects a style of specific or general response to the presentation of cue words (Griffith, Sumner, et al., 2012). And the second limitation is that although evidence exists for the association of OGM with certain forms of psychopathology, our study sample comprised participants without previous history of psychopathology. Consequently, our findings cannot be generalized to clinical populations.
To conclude, our results suggest that the version of the AMT-P is a valid instrument for the assessment of OGM in preschoolers. The one-factor structure of AM specificity is valid for both age samples, but the indices of discrimination and information are better in the 53- to 78-month age-group. Despite the task being more difficult for the younger age-group, the indices show that it can still be administered. In conclusion, the use of instruments to assess AM in preschoolers not only is feasible, as demonstrated in this study, but could also be highly useful in assessing the variables associated with the emergence and development of AM through the preschool years.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study has been supported by the Regional Government of Castilla La Mancha (Consejería de Educación y Ciencia de Castilla La Mancha, Grant PII1I09-0274-8863) and the Ministry of Science and Innovation (Ministerio de Ciencia e Innovación, Grant PSI2010-20088).
