Abstract
In this article data are presented that suggest that individual language learners may have different optimal times of the day for learning a foreign language. Learners were tested on vocabulary learning and retention at different times of the day. In addition, different components of language aptitude were tested. Two components of the language aptitude test show an effect of early risers (‘larks’) performing better in the morning than in the evening, and late risers (‘owls’) performing better in the evening than in the morning. It is argued that chronotype should be included as one of the individual differences components in second language development.
Keywords
In this contribution it is argued that in addition to well established individual differences between language learners, such as motivation and cognitive style, chronotype (CT) should be added to this list. We provide data suggesting that learners differ in the optimal time of the day for language learning. The focus will be on the interaction between time-of-day effects and individual patterns, referred to as CTs, in language processing. While there is a considerable body of knowledge on the impact of circadian rhythm on various aspects of cognitive processing, language learning and, in particular, second language development (SLD) have not been studied from this perspective. In this contribution we report on a study on the effects of CT on one specific aspect of SLD: vocabulary learning and retention. In this study, participants with different chronotypes (‘larks’ versus ‘owls’) performed a word learning task at their preferred and non-preferred time of the day. In addition, we established the individuals’ language learning aptitude.
Individual differences in second language development
Over de last decade the awareness of the importance of individual differences in SLD has grown considerably (see Dörnyei, 2005, for an overview). Rather than seeing groups of learners as homogeneous, researchers are now focusing more on differences in development between learners. Traditionally, the factors studied so far in research on individual differences include:
– motivation;
– attitude;
– working memory capacity;
– learning style;
– personality type;
– language aptitude;
– anxiety/willingness to communicate.
Until recently, individual differences have been seen as relatively stable traits, that is, they partly define language development but are not influenced by it. Dörnyei in his 2009 book on the psychology of the language learner has proposed a different perspective, inspired by notions from Dynamic Systems Theory (see de Bot, Lowie, & Verspoor, 2007; de Bot & Larsen-Freeman, 2011; Larsen-Freeman & Cameron, 2008, for detailed accounts of Dynamic Systems/Complexity Theory and SLD). In this perspective individual differences are seen as dynamic rather than static and interacting with each other over time. The degree of dynamicity may vary between factors: motivation and attitude are more likely to be influenced by external factors than the other ones, but there is growing evidence that even supposedly stable factors such as language aptitude and working memory are in fact less static than assumed (Gass & Lee, 2011). Personality style, anxiety and learning style are more on the static side of the continuum. In this contribution it will be argued that CT is part of the list of relevant individual differences, also in the area of SLD, and that it is more static than dynamic.
Circadian rhythms and chronotypes
Circadian rhythms refer to cyclic variations in neurophysiology in humans and other living organisms. Here we will limit ourselves to human cognition and look at time-of-day effects on performance in different types of tasks. As Roenneberg, Wirz-Justice, and Merrow (2003) mention: ‘Our daily life is organized in three different clocks: a solar clock, providing light and warmer temperatures during the day; a social clock which we see or hear first thing on a working day; and a biological clock, which we sense most vividly when jet lagged, during shift work or when adjusting to daylight savings time.’ (p. 80). So there are three clocks but they do not always work in synchrony. The biological clock is controlled by the suprachiasmatic nuclei in the hypothalamus, a structure that controls periodic biological functions and that acts as some sort of internal clock or pacemaker. This pacemaker is more or less autonomous, but it is influenced by various ‘Zeitgebers’, factors that impact on the sleeping rhythm, such as changes in daylight and social pressure. The adaption to the different clocks has been labeled ‘entrainment’ in the chrono-biology literature.
Entrainment takes place through light to adjust the internal clock, which does not run exactly in a 24-hour cycle, with the external clock, mediated by social time effects (work, school, partner, children). Through training the human system can be entrained to shift its rhythm, but within limits. Some of these limits have been shown in a study on the impact of daylight saving time. Based on a survey among 55,000 people in different countries in Central Europe, Kantermann, Juda, Merrow, and Roenneberg (2007) have shown that the introduction of daylight saving time has substantial negative effects in particular for late CTs: ‘Our data indicate that the human circadian system does not adjust to daylight saving time and that its seasonal adaptation to the changing photoperiods is disrupted by the introduction of summertime.’ (2007, p.1996). The spring transition, in which one hour of sleep is lost, turns out to be more disruptive than the fall one.
Roenneberg (2004) mentions the decline of human seasonality, the living by the seasons. He argues that this is partly due to ‘shielding from natural Zeitgebers’ (p. 195), partly to social time effects, but probably also to daylight savings time.
The establishment of chronotypes
In the last decades a number of instruments have been developed to assess human CTs. In this study we will be using a widely used scale that has been developed by a consortium of universities in Europe, the Munich Chronotype Questionnaire (MCTQ; Roenneberg et al., 2003). The questionnaire combines different types of information to assess CT:
beginning and end of sleeping time;
work days and free days;
time of energy dip;
time needed to be fully awake (‘sleep inertia’);
exposure to daylight;
sleep debt: difference between work days and free days.
Normal sleeping time is time on free days minus sleep debt, and mid-sleep is taken as a referent point for CT, so a sleep onset at 11.30 p.m. and wake up at 7.30 a.m. lead to a 3.30 a.m. mid-sleep score.
Larks and owls
While in the larger population CT is normally distributed, there is a tendency in the CT literature to focus on late versus early sleepers, typically referred to as owls and larks. There is an extensive literature on characteristics of the two extreme CTs. Giampietro and Cavallero (2007) present an overview of this literature and list the following characteristics of the two types.
Larks are typically:
conscientious;
trustworthy;
emotionally stable;
prone to more cognitive failure, particularly during the evening;
more introvert, more neurotic;
less creative.
Owls are typically:
creative, original, flexible;
emotionally unstable;
more bound to have difficult social and familial relations;
more bound to psychological and stress-bound disorders;
showing lower academic performance;
low in appetite in the morning;
‘not proactive in getting ready for school’ (Giampietro & Cavallero, 2007, p. 454);
smoking more;
drinking more coffee and alcohol;
showing more non-conformist behavior;
showing higher stress rates;
‘sensation seekers’.
An interesting point the authors raise is that it is not clear what causes what: Being in a condition which diverges from conventional habit - nocturnal types often experience this situation - may encourage the development of a non-conventional spirit and of the ability to find alternative and original solutions that eventually aim to find a compromise between inner and outer demands. (Giampietro & Cavallera, 2007, p. 461)
Of course the lark/owl distinction makes a dichotomy of what is more a continuum: as mentioned, CT is normally distributed, and most individuals will be in the middle between the two extremes and many will be more lark-like or more owl-like. In the literature on CT, larks and owls are prototypes that reflect certain time-of-day modulations. Schmidt, Collette, Cajochen, and Peigneux (2007) conclude that ‘time-of-day modulations affect performance on a wide range of cognitive tasks measuring attentional capacities, executive functioning and memory’ (p. 755). They also point out that there are striking differences in CT between age groups: young children tend to be early risers until they reach puberty, than they switch to a much later type. With old age, starting in the late 50s, the early risers’ pattern tends to dominate. More than 70% of the elderly population is of the lark type, while this only holds for 7% of adolescents. This has important implications for the assessment of differences in cognitive functioning in elderly compared to younger age groups: it may be that a part of the differences found can be explained by time-of-day effects. Testing takes place typically in the afternoon when the elderly persons are beyond their peak performance, while for young people this is the optimal time.
Chronotype and learning
As Schmidt et al. (2007) show in their overview of research on circadian rhythms in human cognition, the interest in such cyclic performance fluctuations at different times of the day started out with an interest in the best time of the day for teaching in schools (Gates, 1916; Laird, 1925). Laird studied college students who had to do a number of tasks, mainly memory based, at different times of the week and day. The data from 115 students show that what the author calls ‘the typical student’ has the best performance on Wednesday and the performance over the days gradually declined from 8 a.m. to 5 p.m. Blatter and Cajochen (2007) refer to the pioneering work by Nathaniel Kleitman (1933) who tried to assess the best time of the day for teaching in schools. He also carried out some of the first controlled experiments on cognitive functioning and time of the day and found that performance is best in the afternoon and worst late at night and early in the morning. Since that time, the study of circadian rhythms has moved away from school contexts and even learning in general. The bulk of the studies Schmidt et al. mention focus on attention, memory and executive functions.
Chronotype and language processing
The literature on the impact of CT on language use is still fairly limited and so far restricted to studies in which the processing of specific elements at different times of the day is focused on. Reinberg, Ugolini, Motohashi, Fravigny, and Bicakova-Rocher (1988) looked at prelexical access of syllables and syntactic processing in school children. They found that syllable processing peaks at 7.30 p.m., while sentence processing peaks at 9.00 a.m. Morton and Diubaldo carried out a number of studies (1993, 1995) on processing of voicing and spelling proficiency and found superior detection of voiced stimuli in the afternoon compared to the morning and more phonetically inappropriate errors in the afternoon. Dietrich (2006) looked at syntactic comprehension of complex sentences and found the best performance in late afternoon, which seems to be in line with Reinberg et al. (1988). Rosenberg, Pusch, Dietrich, and Cajochen (2009) report on a gender congruency reaction time task in German. Participants had to indicate whether the combination of article (masculine/feminine/neuter) was congruent with the noun. So Die Biene would be congruent while Das Biene is incongruent because Biene is feminine. In this study a constant routine protocol in the 40-hour sleep deprivation study was used with normal CTs, that is, no outspoken larks or owls. Participants had to do reaction time tasks at different times of the day, while they were kept awake for 40 hours. Not surprisingly, a clear effect of sleep deprivation over time was found and best performance was evidenced 3–6 hours after habitual waking time.
These studies provide no explanation for the effects found. It may be that underlying cognitive factors, such as memory performance or attention, are affected at different times of the day and that these in turn lead to differences in processing of different aspects of language, but there is as yet not enough research to even begin to explore such connections.
Second language development
All studies mentioned in the previous section focus on the participants’ first language skills. No studies have been found that looked at the impact of CT on SLD. SLD encompasses a whole range of skills at various levels. SLD is the development of skills in a second language ranging from phonology and articulation to higher order activities, such as sentence formation and discourse and text processing. As yet it is unclear what aspects of SLD may be influenced by CT effects. There may be language-specific aspects, such as the ones mentioned in the previous section, but also more general cognitive effects such as memory performance and attention that have an impact on language processing. One of the factors that has been shown to have an impact on SLD is language learning aptitude. Based on the early work by Carroll and Sapon (1959) and Pimsleur (1966), a number of tests have been developed that typically looked at sound discrimination, lexical memory and grammatical inferencing. The predictive value of such aptitude for learning a foreign language in an instructional setting is fairly high, but there is debate about the contributions of the different components of aptitude (Dörnyei, 2009, p. 208). It has been argued that aptitude tests largely measure the same as intelligence tests and that IQ is in fact the underlying factor explaining most of the variance.
Aim of the study
The aim of this study was to explore whether there might be an effect of CT on SLD. Language aptitude tests should give us a general impression of the sensitivity of different language learning components to CT effects. In addition, we will be focusing on one aspect, vocabulary learning and retention. This is generally considered to be a core aspect of SLD. It was decided to focus not only on vocabulary learning, but also on retention and relearning, building on a long tradition that started with the seminal work by Ebbinghaus in the late 19th century. Ebbinghaus argued that the relearning provides a window on degree of retention of lexical knowledge. The assumption was that different CTs will lead to time-of-day effects, with larks generally doing better in the morning and owls performing better in the early evening.
In this experiment we wanted to test the following hypotheses:
Larks are better than owls in learning, relearning and retaining words acquired in the morning, while owls are better than larks in learning, relearning and retaining words acquired in the early evening.
Larks have a higher language learning aptitude than owls in the morning, while owls are better than larks in the early evening.
Design of the study and instruments used
The whole experiment consisted of the following sessions.
Administration of the MCTQ test to assess the participants’ CT.
Word learning session M in the morning (between 8 and 9 a.m.) followed by a lexical test (M1).
A feedback session in which the participants got individual feedback on their M1 test followed by a second retention test (M2).
Word learning session E in the early evening (between 5 and 6 p.m.) a few days after the M1/M2 session followed by a lexical test (E1).
A feedback session in which the participants got individual feedback on their E1 test followed by a second lexical test (E2).
A retention test in the early afternoon (A) two weeks after the M and E sessions. There were two versions of the A test, each consisting of half of the M items (AM) and half of the E items (EM). While the M and E tests were administered with all participants as a group, in the A session there were two groups, each consisting of half of the larks and half of the owls. So both groups were tested on words they had learned in the morning and words they had learned in the evening.
Administration of the Llama language aptitude test. Half of the larks and owls did this test in the morning, while the other half did the test in the early evening.
Participants
A large group of university students from the faculty of Arts of the University of Groningen were invited to fill out the MCTQ (Roenneberg et al., 2003). On the basis of the outcomes two groups were formed, an early ‘lark’ group and a late ‘owl’ group, using the median of the mid-sleep scores as the criterion to split the group in two. Figure 1 presents the distribution of the mid-sleep scores.

Mid-sleep distribution of participants.
Language learning aptitude
All participants were tested using Meara’s (2005) Llama Language Aptitude test. In order to lower the effect of time of the day, this test was administered a few days before the first learning session. The test was administered at preferred and non-preferred times of the day. The Llama test consists of four parts.
– B: A vocabulary learning task: the participants had 120 seconds to memorize the names of 20 (new) objects.
– D: A sound recognition task: the participants had to recognize sounds sequences with no meaning that they had heard before. First they heard a set of sound sequences to be remembered, followed by a larger set that contained both sequences presented earlier and new sequences.
– E: A sound–symbol correspondence task: the participants had to match 22 unknown sounds with a (fantasy) spelling (e.g. 9e).
– F: A grammatical inferencing task: the participants were presented with a short sentence in an unknown language and the translation in English and had to infer the rules of that new language on the basis of these sentences.
Scores on all four tests were normalized to a score between 0 and 100, with 20–45 being the average score range.
In session M1 the participants performed a vocabulary learning task consisting of 30 pairs of Dutch words and pseudo-words to be remembered using flash cards. In this task they were presented with a card presenting the letter sequence to be learned. Examples of pseudo-word/Dutch word pairs used are spich/laars (English: Boot) and zerof/laken (English: sheet). Words and pseudo-words were controlled for bigram frequency and lexical near neighbors. At first presentation the new word was presented with its translation in Dutch/English, while on consecutive presentations, the new word was presented first, and when the participant did not know, the Dutch translation was given beneath the new word. Words were presented list-wise. Once a word had been translated correctly twice, it was removed from the list. This procedure continued until all words had been learned.
Two weeks later the participants were presented with a list of words consisting of 30 pseudo-words learned in session M1, 15 from the set learned at the preferred time of the day and 15 from the set learned at the non-preferred time of the day. The procedure was the same as in the first session.
The design consisted of one between-group variable (larks versus owls) and two within-group variables (CT and time of testing).
As mentioned, the aptitude test was administered in the morning for half of the larks and owls and in the evening for the other halves.
Stimuli
The 60 pseudo-words consisted of four or five letters beginning and ending with a consonant and were legal sequences in Dutch. Examples are ‘ BLONS’, ‘SPORF’ and ‘‘KLON’. The pseudo-word status of the words was checked in the largest dictionary for the Dutch language, the Woordenboek der Nederlandse Taal. Pseudo-words were paired with real Dutch words in such a way that none of the ‘neighbors’ (real words differing from the pseudo-word in one letter (‘KLON’ -> KLOK (clock)) was related in meaning to the pseudo-word. Words and pseudo-words were controlled for bigram frequency.
Results
Language aptitude test
For the language aptitude test no significant differences were found between the two groups overall. However, when time of testing was included a different picture emerged. Interactions between CT and time of testing are presented in Figures 2–Figures 5.

Interaction plot for B-part Llama test.

Interaction plot for D-part Llama test.

Interaction plot for E-part Llama test.

Interaction plot for F-part Llama test.
For the B-part no significant interaction was found (F (1, 29) = 2.03, p > .10).
The interaction between CT and time of testing was not significant (F (1, 29) = 1.17 p > .10)
The analysis of variance (ANOVA) showed that there is no significant main effect of time of day, but a significant interaction between CT and time of testing (F 1, 29 = 9.357, p < .005)
There was a significant interaction between CT and time of testing for the E-part (F (1, 29) = 12.25, p < .01).
There was a significant interaction between CT and time of testing for the F-part of the test (F (1, 29) = 4.65, p < .05).
Additional analyses showed that larks are better in the morning than in the evening on the E-part (t (13) = 3.42, p < .005) and owls are better in the evening than in the morning for the B-part (t (12) = 2.79, p < .05). In addition, larks tend to be better than owls in the morning only on the E-part (t (13) = 1.93, p = .075) and owls are better than larks in the evening on the E-part (t (12) = 3.12, p < .01) while there was a tendency in the same direction for both the D- and the F-part (t (12) = 1.89, p = .082 and t (12) = 1.73, p = .109). These data suggest that the E-part, the sound–symbol correspondence task, is most sensitive to CT and time-of-day effects.
Word learning experiment
First we looked at the scores in the morning and evening tests. Test 1 refers to the test right after learning, while test 2 refers to the test after the participants received feedback on their errors in test 1. Table 1 presents the means and SDs for the four tests.
Means and standard deviations for morning and evening tests 1 and 2.
There was a highly significant correlation between M1 and E1 (r (30) = .73, p < .001) and between M2 and E2 (r (29) = .75, p < .001), and the difference between the M1 and E1 tests was significant (t (29) = 4.22, p < .001), as was the difference between M2 and E2 (t (28) = 4.37, p < .001). In other words, participants behaved similarly on the two tests, but overall scores were higher in the evening than in the morning. This was not different for larks and owls; differences between groups were not significant, and nor was the interaction between CT and time of learning. It could also be that the set of items in the morning tests were harder to learn.
In order to test the effect of feedback in vocabulary learning by larks and owls we looked at the M2/E2 tests with M1/E1 tests as covariate and CT as the independent variable. No significant effect of CT was found (F < 1).
In order to test retention in the retest two weeks after learning, we tested all subject in the afternoon. In the analysis we looked at retention of words learned in the morning (M2 test) and words learned in the evening (E2 test) in the post test. For this we looked at post-test scores for morning words with M2 as a covariate and for evening words with E2 as a covariate and CT as the dependent variable. The means are presented in Table 2.
Means and standard deviations for larks and owls for morning words and evening words.
CT: chronotype.
For the morning words there was a tendency towards significance for the difference between the two groups (F(1, 25) = 3.53, p =.072); a similar effect was found for the evening words (F(1, 26) = 3.04, p =.093). The interaction between CT and groups of words did not reach significance (F < 1).The very high SDs suggest large differences between participants, in particular for the owls. The outcomes suggest a better retention potential for the larks for both groups of words and no differential effect for CT. Given the very low retention scores and high SDs for the owls, a lack of motivation to really do well on the retention tests cannot be excluded as an explanation.
Next we looked at the overall effect of testing in the morning or in the evening in relation to mid-sleep scores. The outcomes of the correlational analysis show that none of the correlations between the mid-sleep score and the word learning scores (M/E/1/2) reach significance.
Discussion
In this contribution a first attempt is made to test the impact of CT on a number of aspects of SLD. The study focused on language aptitude on the one hand and word learning and retention on the other hand.
For the aptitude tests, a significant CT by time of testing interaction was found for two out of four parts: for sound–symbol correspondence and grammatical inferencing participants scored significantly better on their preferred time of the day, the morning for larks and the early evening for owls. No effect was found for the vocabulary learning part of the aptitude test, and this is in line with the findings from the word learning experiment. In this experiment we looked at the relation between CT and time of testing as independent variables and word learning, relearning and long-term retention as the dependent variables. The main findings were that learning is better in the evening than in the morning for both groups, but there is no differential effect for time of testing by CT. The same was found for the relearning of words. For the retention part of the experiment it was found that larks score higher on both words learned in the morning and words learned in the evening, but the effect was not very strong due to very high SDs and generally low scores, which may be caused to be lack of motivation in doing the test.
The experiment presented here is no more than a first attempt, and it has been useful to point out a number of issues to be taken into account in further research along those lines. These include the following.
– Time of testing: the late afternoon or early evening may not be late enough to find differences between larks and owls: test administration after 10 p.m. may be more appropriate.
– Selection of participants: to find stronger CT effects, more extreme CT differences could be included in the study: here the participants were not selected a priori on their CT, so the distinction between the groups in two types on the basis of the median mid-sleep score was in fact more of a continuum.
– The participants tested were all students, who may show an atypical CT pattern. Other groups should be included to draw more general conclusions. An interesting extension of this study would be to include international students with different cultural backgrounds.
– Number of participants: in order to test the second-order interactions in this design, a substantially larger group is needed. Ideally the scores on the language aptitude test should be used as covariate in the learning/retention study, and this calls for even larger numbers.
– Motivation may interact with CT and time of testing. We have at least suggestions in our data that participants seemed to be less motivated to do well on dis-preferred times of the day. A more strict protocol and making the financial compensation dependent on high scores may help to solve this problem.
Here we have positioned the findings on CT within the individual differences field. As mentioned earlier, Dörnyei (2009) has argued for a new perspective on individual differences that is inspired by Dynamic Systems Theory in which the four main assumptions behind the more traditional approach to individual differences are challenged:
– ‘individual differences exist in the sense that we can identify, define and operationalize them in a rigorous scientific manner;
– individual differences are relatively stable attributes;
– different individual differences form relatively monolithic components that concern different aspects of human functioning and that are therefore only moderately related to each other;
– individual differences are learner-internal, and thus relatively independent from the external factors of the environment’. (2009, p. 182)
It seems that circadian rhythm and CT actually fit in with these assumptions, since they can be established in a valid way, are relatively stable, learner-internal and largely independent of external factors. CT may be related to personality traits, but it is unclear what defines what: does CT lead to certain personality traits or the other way around? Dörnyei’s plea for a more dynamic approach to individual differences seems to be only partly relevant for the present study, since there are fairly strong indications that CT is a stable and static characteristic of individuals and as yet no interactions with other variables over time have been found. But of course, hardly any research has been done on this so far, so it may be a bit too early to draw conclusions on the static/dynamic nature of CT as an individual characteristic.
Footnotes
Acknowledgements
The author is indebted to Jurrien Schuurman for his assistance in the running and analysis of the experiment and to the Chronobiology group at the University of Groningen, in particular Martha Merrow for their inspiration.
Funding
This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
