Abstract
Spontaneous motion synchrony between interaction partners benefits the interaction. Here we probed how musical rhythms, which are highly temporally organized, modulate this process. We video-taped conversations held in silence or with an auditory background that was metrical and regular (one measure looped), metrical and irregular (different measures in random order), non-metrical and regular, or non-metrical and irregular. Motion time-series derived from the videos entered a cross-wavelet coherence analysis showing that more musical rhythms amplified rhythm-relevant motion frequencies at the level of the individual and facilitated social synchronizing at the level of the dyad. Yet, we also observed rhythm-specific motion interference effects and reduced conversation pleasantness when compared with silence. These results indicate that musical rhythms, perhaps by imposing a temporally rigid mode of synchronizing, hinder rather than further ongoing social processes. Silence or sounds with little temporal organization and predictability seem preferable as a backdrop for interactional exchange.
Chatting with a friend in a cafe can be very enjoyable. But what apart from being with someone we like makes it fun? Is it the caffeine, the conversations of other people, or the music that’s playing in the background? Here we pursued the latter possibility by studying the effect of a musical background rhythm on the temporal coordination and subjective quality of social interactions.
Quantifying Synchrony in Social Interactions
Over the past decades, there has been an increasing interest in the cognitive and emotional processes that unfold in natural human interactions. A popular methodological approach has been to engage dyads in a fairly unstructured manner and to examine how dyad members coordinate their activity and how such coordination predicts interaction outcomes (e.g., Fujiwara & Daibo, 2016; Tschacher et al., 2014). One relevant measure has been the degree of synchrony between interaction partners in, for example, autonomic functioning (Konvalinka et al., 2011), cognitive activity (Manera et al., 2013) and gross body motion (Tschacher et al., 2014). In all cases, evidence implies a natural tendency for individuals to converge in the timing of relevant processes and for such convergence to be accompanied by cognitive and affective benefits (for a review see Hoehl et al., 2021).
Yet, how do we assess synchrony in social interactions? The first attempt to measure the temporal coordination between interaction partners was made by Condon (Condon & Ogston, 1966; Condon & Sander, 1974), who, focusing on gross body motion, studied video recordings frame-by-frame and manually coded the activity of different limbs. His work helped lay the groundwork for newer techniques, which removed the human element in time-series coding. One such technique, still relying on video, defines regions of interest around each recorded person and registers for each region the number of pixels that change from frame to frame. This results in two motion time series, which are then subjected to a cross-correlation analysis measuring the similarity between them. Alternatively, and as done in this present study, the time series may be subjected to a cross-wavelet coherence analysis (for a tutorial see Issartel et al., 2015). This method entails a wavelet transformation in which a sliding window moves along each time series yielding a two-dimensional time-frequency space. A coherence analysis of two such spaces (e.g., from two interacting individuals) returns a coefficient that may be interpreted similarly to a squared correlation coefficient and that indexes how changes in the intensity of a given motion frequency (i.e., fast/slow) were synchronized in time.
Using either a simple cross-correlation or the more differentiated cross-wavelet coherence approach, research has confirmed that dyadic interactions are characterized by some degree of motion alignment. For example, Tschacher et al. (2014) found significant synchrony in dyads composed of two strangers discussing a general topic of interest for 5 minutes. Similarly, Fujiwara and Daibo (2016) recorded motion during a 6-minute unstructured conversation between two strangers. They then compared the cross-wavelet coherence, henceforth simply called coherence, from the original pairs with the coherence of randomly selected pairs and found that the former yielded significantly greater synchrony than the latter.
Synchrony Promotes Positive Interactions and Vice Versa
There is now substantial evidence that interactional synchrony predicts positive interaction outcomes (Mogan et al., 2017; for a review see Schirmer et al., 2016). More synchronous dyads report more positive affect (Tschacher et al., 2014), are more likely to empathize with each other (Koehne et al., 2016), to trust one another more (Cacioppo et al., 2014) and to be more creative (Won et al., 2014). Moreover, observers are more likely to perceive affiliation between individuals the more strongly these individuals synchronize (Latif et al., 2014). Given the apparent benefits of interactional synchrony, efforts have been directed at identifying factors promoting its emergence. One factor that was highlighted as relevant is a positive relationship between individuals. Friends synchronize more readily than strangers (Latif et al., 2014) and strangers synchronize more readily when they have a good as compared with a poor first impression of each other (Cheng et al., 2020; Miles et al., 2010). Additionally, greater involvement in the interaction enhances both synchrony and rapport (Dunbar et al., 2020). Lastly, factors inherent to the individuals matter. For example, the wish to deceive enhances synchrony for faster motion expressed in higher frequencies (Dunbar et al., 2020), whereas poor social skills and autistic traits decrease synchrony (Georgescu et al., 2020; Zampella et al., 2020).
Musical Rhythms as Drivers for Synchrony
To date, few studies have examined the role of contextual factors external to the individuals (Dunbar et al., 2014). Moreover, none have addressed the potential role of musical rhythms despite the fact that such rhythms frequently accompany human interactions (Launay et al., 2016; Patel, 2014; Tarr et al., 2014) and are recognized as important synchronizing stimuli (for a review see Obleser & Kayser, 2019).
Music, like language, is a complex communication system comprised of both (i) melodic or spectral features and (ii) temporal features. Temporal features are of particular relevance here as they shape what we perceive as rhythm. Importantly, the term rhythm typically characterizes not just any temporal pattern, but one that elicits the perception of a regular emphasis called the beat. Temporal structures in which sound to sound intervals are related hierarchically by integer ratios promote beat perception and are referred to as metrical (Jones, 1976; Jones & Boltz, 1989). Other temporal structures make it hard or impossible to perceive a beat and are referred to as non-metrical (Figure 1A). Note, however, that metricality may not be an all-or-none phenomenon, but vary continuously (Jones, 1976; Jones & Boltz, 1989). For this reason, we use the terms high and low metricality instead and refer to any sound sequence as a rhythm.

Stimulus background manipulation: (A) metrical properties of the measures used to create the background rhythms. The upper part of panel A illustrates one exemplary rhythm in its high (left) and low (right) metricality variant. Shown in blue are the within-measure intervals in milliseconds. Shown in black is the ratio of each interval to the smallest interval. The lower part of the figure illustrates the temporal position of sounds for all the measures in the experiment. The red dashed lines mark the position of the beat, (B) histogram of tapping performance to the rhythms used in this study. The angles between taps and the measures’ beats are illustrated as petals in a roseplot for the five high (violet) and the five low metrical measures (green). Larger petals reflect more taps at a given angle. The 0 position reflects the beat, which occurred every 750 ms, and (C) regularity manipulation. Metricality and regularity were orthogonally manipulated. In the high regular condition, one of the high metrical or the low metrical measures was repeated throughout a block. In the low regular condition, we presented a random order of all high or all low metrical measures.
Apart from metricality, musical rhythms are typically characterized by a certain interval regularity. Indeed, metricality and regularity may be considered orthogonal temporal properties that together and individually can bias one to perceive a sound sequence as music (Deutsch et al., 2011; Rowland et al., 2019). Measures (i.e., the musical units that organize sounds in a beat-based manner) that are both high and low in metricality may be identical throughout a rhythm thereby making measure intervals regular or they may vary thus making measure intervals irregular. For example, looping a four-beat measure in which each beat is acoustically marked maximizes regularity as well as metricality. By contrast, a rhythm in which notes fall on metrical positions while inter-note-intervals vary from measure to measure is still high in metricality, but low in regularity. Notably, sound sequences with low metricality but high regularity may “warp” one’s temporal processing and eventually create the illusion of a beat (Rowland et al., 2019).
There is a substantial literature documenting how listeners synchronize mental and behavioral processes with the temporal structure of musical rhythms (for reviews see Hoehl et al., 2021; Obleser & Kayser, 2019). Most of this work has focused on comparing periodic sequences, high in both metricality and regularity, with aperiodic sequences, low in both metricality and regularity. Results suggest that the former stimuli facilitate beat finding (Grahn & Brett, 2007), rhythmic motor synchronizing (Grahn & Brett, 2007), and the alignment of dynamic mental processes (Escoffier et al., 2010; Jones et al., 2002; McAuley & Fromboluti, 2014). Moreover, these findings were used to bolster the idea that metricality underpins these effects (Jones & Boltz, 1989; Lakatos et al., 2008; Merchant et al., 2015; Miller et al., 2013; Nozaradan et al., 2011). However, more recent work manipulating metricality and regularity separately implied the latter may also be important in synchronizing listeners (Breska & Deouell, 2017; Schirmer, Wijaya, et al., 2021).
The Present Study
As reviewed above, there is much research highlighting spontaneous motion synchrony as an important feature of positive human interactions (e.g., Dunbar et al., 2020; Miles et al., 2010; Tschacher et al., 2014). Additionally, studies showed that, individually, listeners synchronize to musical rhythms as a function of the rhythm’s metricality and regularity (for a review see Obleser & Kayser, 2019). Together, this evidence raises the possibility that musical rhythms could be an important social stimulus that boosts the temporal alignment of interaction partners thus improving social outcomes.
To address this possibility, we video-taped 5-minute conversations between two young adults on a general topic of interest. Each dyad completed five conversations with different background conditions. In one condition, participants talked in silence providing us with a baseline for spontaneous synchronizing. In the other conditions, they talked with a rhythmic background that was either high or low in metricality and either high or low in regularity. In the latter conditions, spontaneous synchronizing came under the influence of a guiding temporal structure. After each condition, participants rated conversation pleasantness.
We analyzed the motion change registered in the videos using a coherence analysis. Of interest was slow motion change represented by low frequencies, as they were previously established to encode interactional synchrony (Fujiwara & Daibo, 2016; Fujiwara et al., 2019; Wiltshire et al., 2019). Additionally, we examined two narrow frequencies associated with the background stimulus: (i) the measure onset frequency (i.e., the duration of measures, the basic units organizing notes in a musical piece and giving rise to the perception of a meter), which was constant across rhythms, and (ii) its 4th harmonic, which was also the beat in the highly metrical conditions. Across low and background-specific frequencies, we pursued coherence in original dyads as well as in random cross-dyad pairs as the latter could dissociate non-social, background-driven effects from effects emerging in the interaction.
Our hypotheses were as follows.
Hypothesis 1
We reasoned that the background rhythms would synchronize motion in rhythm-specific frequencies with greater synchrony for more musical rhythms (i.e., high metricality and regularity). However, because social processes might overwrite such frequency-specific effects, the clearest evidence for this should emerge in the analysis of random dyads.
Hypothesis 2
We expected that musical rhythms maximize temporal alignment with a partner for low frequencies previously linked to interaction synchrony. Thus, coherence should be greatest for conversations conducted on sound backgrounds that are high in both metricality and regularity and weakest for conversations conducted in silence or on sound backgrounds low in both metricality and regularity.
Hypothesis 3
We predicted that background rhythms should be relevant in modulating the subjective pleasantness of social interactions. Moreover, in line with previous work identifying a relationship between synchronizing and positive affect (Tschacher et al., 2014) and underscoring the relevance of music in coordinating human activities and enhancing well-being (Tarr et al., 2014), subjective pleasantness should be highest when both metricality and regularity are high and lowest in silence or when metricality and regularity are low.
Methods
Participants
Our considerations regarding adequate sampling were as follows. Previous research using a coherence approach to study interactional synchrony reported significant differences between real and pseudosynchrony resulting from contrasting the coherence coefficient of original with that of randomly paired dyads (Fujiwara & Daibo, 2016; Fujiwara et al., 2019; Schmidt et al., 2012, 2014; Wiltshire et al., 2019) and identified a role for contextual variables or participant sex (Fujiwara et al., 2019; Schmidt et al., 2014). The sample sizes in these studies ranged from 31 to 42 dyads. As the effect size of rhythm effects on synchrony was unknown to us, we aimed at matching the power of previous studies.
Our total number of recorded participants was 104, and they formed 52 dyads. The data from three dyads had to be discarded because of experimenter error (the recording was too short or the experimenter walked into the camera’s field of view). Because recruitment was done within a campus setting, there may have been dyads in which the two individuals were familiar with each other. However, as we recruited individuals rather than dyads we assume that for most the partner was unknown. We did not try to keep dyad familiarity constant across our sample reasoning that this factor would be constant across conditions and hence not confound our results. Yet, we balanced the sex of individuals such that of the 49 dyads that entered our analysis 17 were mixed-sex dyads, 16 were composed of two women, and 16 were composed of two men. Previous research raised the possibility of sex differences in conversation styles (Kendall & Tannen, 2015) and interactional synchrony (Fujiwara et al., 2019), and we wished to ensure that our sample is unbiased and representative of the general population.
The mean age of participants was 21 (SD = 4.5). Prior to the experiment, they completed the Barcelona Music Reward Questionnaire (Mas-Herrero et al., 2013). This instrument assesses five music domains labeled emotional evocation, sensory-motor, mood regulation, music seeking, and social reward—with each having a maximum score of 20. The reliability of these domain measures is documented with .88, .93, .87, .89, .78, respectively. In our sample, participants scored on average 15 (SD = 2), 13 (SD = 3), 16 (SD = 2), 13 (SD = 3), and 14 (SD = 3), respectively, which compares to the general population (Mas-Herrero et al., 2013). Please note that this questionnaire simply served to assess musicality or music appreciation and to ensure that our sample is not unusual in these regards.
Materials
The research materials and data can be accessed through the authors based upon reasonable request. Apart from rhythm, music entails other features (e.g., melody, harmony) that influence sound perception and attention. As we wished to understand rhythmic effects, we used stimuli that were characterized by temporal variation only thus allowing us to rule out non-rhythmical confounds. The rhythmic backgrounds we used had (i) high metricality and high regularity, (ii) high metricality and low regularity, (iii) low metricality and high regularity, and (iv) low metricality and low regularity. The measures forming these rhythms were designed as follows. First, we composed five four-beat measures that were highly metrical and had a duration of 3 seconds. Each measure comprised six click sounds that were synthesized in Matlab. We then modified each of these measures by shifting sound onsets to positions that violated the original integer ratios to create a corresponding variant with low metricality. Whereas the intervals in the high metrical rhythms were related by ratios of 1 : 2 : 4 : 6, they were related by ratios of 1 : 3.43 : 5.43 : 8.57 in the low metrical rhythms (Figure 1). We conducted a tapping experiment with 16 participants (eight women) not involved in the main experiment. In this experiment, each measure was looped 40 times and participants were asked to tap along. The results indicated better beat finding for measures high as compared to low in metricality (Figure 1).
For the two conditions with high regularity, one measure of high and low metricality, respectively, was looped for the duration of the interaction. For the two conditions with low regularity, all five measures with high and low metricality, separately, were presented in random order. The measures selected for the high regularity conditions were rotated across dyads as to address a potential measure-specific effect.
The discussion topics used for the interactions were developed based on informal student feedback and were aimed to be generally interesting. They included (i) “So far, what university courses do you think are the most interesting?,” (ii) “Imagine if you were a lecturer of a university course, what would you like to teach to the students?,” (iii) “If you could hold an event at our university, what event would it be?,” (iv) “In your opinion, what is an ideal university campus?,” and (v) “In your opinion, what important skills should a university teach its students?.” The assignment of discussion topics to backgrounds was rotated across dyads as to avoid confounding topic with background effects.
Procedure
Prior to the experiment, participants completed an informed consent process and were then asked to fill in the Barcelona Music Reward Questionnaire. Subsequently, they were seated on two chairs positioned at a 90° angle. One camera was placed diagonally from each chair trained on one of the two participants, but recording both of them at a rate of 25 frames per second. Participants were aware of being recorded. Prior to each interaction, participants were given a sheet of paper with a discussion topic. They gave a signal to the experimenter when they were ready to begin. At this point, the experimenter started the video recording and rhythmic background, if applicable. After 5 minutes, the background, if any, stopped and the experimenter signaled to the participants that the trial was over. The experimenter then provided each participant with a rating sheet on which they scored the subjective pleasantness of the interaction on a 7-point scale ranging from very unpleasant to very pleasant.
Participants completed five interactions with post-interaction rating—one for each of the background conditions. The order of background conditions was counterbalanced across dyads using a Latin square design.
Data Analysis
Please refer to Figure 2 for an overview of the data analysis steps. First, videos trained on left and right participants in a dyad were synchronized by matching their audio tracks and trimmed to 7375 frames. Using Matlab, we defined for each video two regions of interest (ROI) encompassing the left and right participant respectively by finding the negative peak in accumulative motion that vertically separated the two participants in the video display. Visual inspection of the location of the separator for each video was done to ensure it was accurate. For each ROI, we converted recordings from color into grayscale and determined the number of pixels changing from frame to frame. We then set pixel values of 20 or less to 0 as to avoid visual noise falsely being identified as movement. The resulting values from the two left and the two right ROIs from both videos were then averaged producing two motion energy time series, one for each participant.

Data handling overview.
We added zeros to the beginning and end of each time series at half the signal length to increase the duration of our input data, which in turn helped reduce artifacts associated with the frequency decomposition at the signal edges where data is limited. The zero-padded data was then lightly smoothed using a 5-frame running average and subjected to Matlab’s coherence function with its default settings yielding a coherence time-frequency matrix for each dyad. Additionally, we computed a Morlet wavelet transformation using Matlab’s wt function and obtained the participant-specific oscillatory power for data points in the coherence time-frequency space by squaring the wt output. Both approaches rely on Matlab’s morlet function as a mother function which generates a wavelet with 5 periods. The power value we obtained indicated the strength or intensity with which a given frequency was present at a given moment in time for each participant.
We excluded frequencies below 0.0095 Hz, which fell into the cone of influence (i.e., distortion of estimates close to signal end points where zeros had been added). Moreover, as visual examination of oscillatory power revealed that motion energy was most pronounced in frequencies below 1 Hz, we restricted all analyses to this low frequency range (please refer to the Supplemental Materials for additional analyses of frequencies up to 4 Hz). Specifically, we examined the average of frequencies between 0.0095 and 1 Hz and separately explored meter (0.33 Hz) and beat frequency (1.33 Hz) of the background rhythms. The former frequency range maps onto earlier work on interactional synchrony (Fujiwara & Daibo, 2016; Fujiwara et al., 2019; Wiltshire et al., 2019), whereas the latter two frequencies align with past studies on synchronizing to music (Escoffier et al., 2015; Schirmer, Wijaya, et al., 2021) and present a comfortable tempo for listeners to move along (Schirmer et al., 2020). Incidentally, they also converge with more recent evidence that frequencies between 0.5 to 1.5 are particularly relevant for social bonding between interaction partners (Fujiwara et al., 2020).
For the analysis of background-specific frequencies, we identified the relevant frequency bin in the time-frequency space as well as its two neighbors one bin removed. We then obtained the average of the two neighbors and subtracted this average from the value associated with the target frequency. Thus, we aimed to eliminate general low-frequency effects and explored effects that were specific to just the meter and just the beat frequency. To elucidate dynamical coherence changes across the course of an interaction, we divided each interaction into halves, subsequently referred to as Time Bins, and averaged values within each half. We would have preferred a division into smaller bins, however, because zero padding is known to introduce reductions in power at the signal edges (Cohen, 2014), we wished to address this confound by keeping edge exposure constant across bins. To facilitate the detection of power or coherence outliers, we computed the participant and dyad means across conditions, respectively, and removed data points more than three standard deviations away from the mean. The remaining data points were normalized to a mean of 0 and a standard deviation of 1 and subjected to statistical analysis.
As mentioned above, motion coherence observed in original dyads could arise from social processes associated with the interaction or background-driven processes associated with the auditory rhythms. To more specifically examine the latter, we created 1000 permutated data sets in which we selected for the left individual from each dyad randomly and with replacement a right individual from another dyad. We then examined the resulting data sets by averaging the cross-wavelet coherence across all 1000 permutations and subjecting the result to the same statistical approach as used for the analysis of original dyad coherence.
Statistics were conducted in R (R Core Team, 2015). To account for the nesting of participants in dyads, to accommodate the mixing of continuous and categorical variables, to facilitate dealing with missing data points (i.e., outliers), and to flexibly model effects (excluding those of no interest), we opted for a mixed effects modeling approach. Please note, however, that because we had only one data point per condition, temporal bin, and participant (rating) or dyad (coherence), we could not model random slopes, but only random intercepts. Modeling was done using the mixed function from the afex package (Singmann et al., 2019) in case of a linear dependent variable (coherence) and with the clmm function from the ordinal package (Christensen, 2018) in case of an ordinal dependent variable (rating). If models failed to converge, we used an ANOVA/ANCOVA and applied Greenhouse-Geisser correction in cases where the assumption of sphericity had been violated. Follow-up testing of significant interactions was done using the emmeans package (Lenth, 2018) with the Bonferroni correction for multiple comparisons.
Results
Do Metricality and Regularity Prompt Synchronizing in Rhythm-Specific Frequencies?
A first set of analyses explored whether participants synchronized in rhythm-specific frequencies. Because changes in power may confound coherence effects (Cohen, 2014), normalized power was included as a continuous control variable (main effects and interactions) in all coherence analyses (for analyses without this covariate please refer to the Supplemental Materials). In an effort to deal with the non-orthogonal nature of our design, we first compared the four rhythms against each other and then tested differences between each of the four rhythms and silence. We refer to these two steps as rhythm and silence analysis. The rhythm analysis included Power, Metricality (high/low), Regularity (high/low), Time Bin (1/2; first and second half of the experiment), and all interactions as fixed effects and the intercepts of dyads as the random effects. This analysis excluded the silent background condition. Effects of interest were those involving Metricality and/or Regularity. For the silence analysis, we subjected the relevant dependent measures to individual mixed models with Power, Block (rhythm/silence), Time Bin (1/2; first and second half of the experiment) and all interactions as fixed effects and the intercepts of dyads as the random effects. Of interest was the factor Block only (main/interaction effects). To both models, the factor Frequency (meter/beat) was added as a fixed effect of no interest and without modeling interactions. This factor was included because we wished to include data for both frequency measures while controlling our Type 1 error. Please note that because the beat is a harmonic of the meter, beat and meter frequencies have largely comparable effects on synchronizing as established previously (Schirmer, Wijaya, et al., 2021) and confirmed here via exploratory analysis.
Original dyads
The results are illustrated in Figure 3. As expected, there were no metricality and regularity effects in original dyads for rhythm-specific frequencies. The rhythm analysis yielded only non-significant effects (ps > .21). Additionally, the silence analysis returned fairly similar results across rhythms. There were marginal or significant Block effects with high metricality and high regularity (F[1,335] = 6.27, p = .013), high metricality and low regularity (F[1,383] = 3.02, p = .083), low metricality and high regularity (F[1,383] = 4.47, p = .035), and low metricality and low regularity (F[1,336] = 3.76, p = .053). In all cases, synchronizing in rhythm-specific frequencies was reduced when there was sound as compared to silence. For the rhythm with low metricality and low regularity only, the Block effect marginally interacted with Time Bin (F[1,336] = 3.5, p = .062).

Mean motion coherence between members of a dyad as a function of background condition: (A) time-frequency plots. The frequency bins returned from the cross-wavelet coherence analysis are shown on the y-axis. The x-axis shows the time in frames (1 frame = 40 ms). Changes in coherence are illustrated as changes in color with warmer colors indexing greater motion synchrony, (B) mean coherence for the average of the background rhythm’s meter and beat frequency, and (C) mean coherence for the average of frequencies below 1 Hz.
Random dyads
Because mixed models failed to converge, we pursued rhythm effects in data averaged across permutations with an ANCOVA as implemented in the ez package (Lawrence, 2016). The repeated measures factors included Metricality, Regularity and Time Bin for the rhythm analysis and Block and Time Bin for the silence analysis. Additionally, a permutated dyad’s condition-wise power mean was added as a repeated measures continuous covariate.
Supporting our first hypothesis, the rhythm analysis yielded a main effect of Metricality (F[1,48] = 4.6, p = .037, Gη2 = .006) and an interaction of Metricality, Regularity and Time Bin (F[1,48] = 5.27, p = .026, Gη2 = .005). Exploration of the first half of the conversation identified a significant Metricality effect (F[1,48] = 9.97, p = .003, Gη2 = .025) indicating that coherence was greater with high as compared to low metricality. The Regularity effect (F[1,48] = 3.04, p = .087, Gη2 = .007) and the interaction of Metricality and Regularity (F[1,48] = 3.67, p = .061, Gη2 = .01) were only marginally significant but directionally in line with expectation. There were no significant effects in the second half of the conversation (ps > .661).
Results of the silence analysis were less clear. They yielded a significant Block by Time Bin interaction for the rhythm with low metricality and low regularity (F[1,48] = 9.83, p = .003, Gη2 = .049) indicating that coherence was lower than for silence in the first (F[1,48] = 5.42, p = .024, Gη2 = .022), but not the second half of a conversation (p = .984). All other effects were non-significant (ps > .47).
To summarize, the present data partially support the first hypothesis. High metricality and, but marginally, regularity, emphasized synchronizing in rhythm-specific frequencies relative to low metricality and low regularity. As expected, these effects emerged only in random dyads in which social processes had been neutralized. Notably, however, high metricality and regularity did not enhance synchronizing over silence. Instead, low metricality and low regularity appeared to dampen synchronizing and did so only at the beginning of the conversation.
Do Metricality and Regularity Enhance Synchrony in Social Interactions?
As a second step, we explored coherence across all frequencies below 1 Hz. Statistical modeling followed what was described above. Again, the rhythm analysis included Power, Metricality (high/low), Regularity (high/low), Time Bin (1/2), and all interactions as fixed effects, whereas the silence analysis included Power, Block (rhythm/silence), Time Bin (1/2) and all interactions as fixed effects.
Original Dyads
Our second hypothesis was that metricality and regularity enhance motion coherence between interaction partners in typical interaction frequencies. In line with this, the rhythm analysis produced an interaction of Metricality and Regularity (F[1,321] = 14.18, p < .001) with all other effects being non-significant (ps > .304). Bonferroni corrected follow-up comparisons showed that when regularity was high, metricality further enhanced coherence (ß = .41, SE = .13, t = 3.21, pB = .0015). Contrary to prediction, however, when regularity was low, metricality reduced coherence (ß = -.27, SE = .13, t = −2.12, pB = .035). The silence analysis corroborated these results. The Block effect was significant for the rhythms high (F[1,38] = 5.26, p = .023) or low (F[1,133] = 3.84, p = .052) in both metricality and regularity but not for other rhythms (ps > .237).
Random Dyads
The rhythm analysis returned a Metricality main effect (F[1,330] = 6.02, p = .015) indicating that coherence was lower for high as compared to low metrical rhythms showing that non-social rhythmic processes de-emphasized alignment in the low frequencies thought relevant for social interactions. All other effects (p > .142) of this and the silence analysis were non-significant (ps > .17).
To summarize, our data partially support the second hypothesis. High metricality and regularity enhanced synchronizing relative to other conditions. However, we also noted a synchronizing benefit when metricality and regularity were low. Thus, not just high but also low temporal prediction of auditory background facilitate motion alignment between dyad members. That these effects were absent in the analysis of random dyads implies that they emerge through social processes in interaction with the partner.
Do Metricality and Regularity Enhance Conversation Pleasantness?
Interaction evaluations are illustrated in Figure 4. First, we conducted a multi-factorial analysis with Metricality and Regularity as the fixed effects and excluding the silent condition. Again, the random effects term comprised the intercept of participants nested in dyads. Contrary to expectation, the result was non-significant (ps > .111). Second, we conducted one-way repeated measures analyses testing whether rhythmic backgrounds increase conversation pleasantness relative to silence. Block had two levels and served as the fixed effect, and the intercepts of participants nested in dyads served as the random effects.

Interaction pleasantness ratings.
The results contradicted our hypothesis. Highly regular backgrounds reduced perceived pleasantness relative to silence (metricality high: ß = .72, SE = .28, Z = 2.55, p = .011; metricality low: ß = .55, SE = .27, Z = 2.04, p = .041). Backgrounds with low regularity had no significant effect (ps > .117).
In sum, the rating data indicated that high regularity was associated with reduced conversation pleasantness. These results are the opposite of what we had predicted.
Discussion
Often, when we hear a rhythm, we automatically move to its beat. This form of rhythmical entrainment emerges early in infancy (Zentner & Eerola, 2010) and is something we share with a few other species (Patel, 2014; Wilson & Cook, 2016). Here, we explored its relevance for human social interactions and found, somewhat surprisingly, that it might get in the way when two individuals engage in conversation. In the following sections, we will discuss our results in some detail focusing on how rhythms shape dynamic dyadic activity and considering the relevant temporal features underpinning the observed rhythmic effects.
Unattended Musical Rhythms Guide Motion in Dyadic Interactions
External rhythms, such as those present in music or in the motion of an interaction partner, affect one’s own rhythms in complex ways. Here we tackled this complexity by examining both the specific frequencies of background sounds as well as the broad low frequency range typically linked to dyadic motion synchrony (Fujiwara & Daibo, 2016; Fujiwara et al., 2019; Wiltshire et al., 2019). Moreover, each variable was pursued in original dyads and random dyads in which partners were randomly paired for coherence analysis. We reasoned that the former entailed both social processes tied to the interactive exchange and non-social processes triggered by the background manipulation (Grahn & Brett, 2007; Schirmer, Wijaya, et al., 2021). By contrast, the latter had social processes removed therefore allowing us to gauge non-social synchronizing arising from the background and the broader interaction context.
Our analysis of frequencies present in the background rhythms returned contrasting results for original and random dyads. In original dyads, synchronizing in these frequencies was suppressed across all rhythms relative to silence while the rhythms themselves did not differ. In random dyads, however, none of the rhythms differed from silence but they differed from each other. Specifically, there was enhanced synchronizing when metricality and, but marginally, regularity were high as compared to low.
Together these results imply that dyad members were influenced by the temporal structure of auditory backgrounds. As expected, they showed increased synchronizing in background frequencies when metricality and regularity approximated what is typical for music. In other words, metricality and, but marginally, regularity primed participants to move at dominant background frequencies and thus synchronous with each other. This is interesting as this kind of motion entrainment has so far only been documented when participants were explicitly asked to move in time with an auditory stimulus (Chauvigné et al., 2019; Grahn & Brett, 2007; Schirmer et al., 2020). We show here that it also emerges to an unattended and task-irrelevant sound sequence. Moreover, we find it is accompanied by an inhibition of sound effects on the kind of motion alignment that emerges dynamically via social processes for the rhythms’ dominant frequencies. Thus, background rhythms likely interfered with the spontaneous movement coordination that characterizes dyadic interactions.
Unattended Musical Rhythms Impair Interactional Outcomes
That background rhythms were interfering and potentially unwelcome agrees with the results we obtained when analyzing synchrony in low, interaction-relevant frequencies and when exploring conversation pleasantness. Synchrony effects again contrasted for original and random dyads. In original dyads, synchrony was highest when background metricality and regularity were high. Thus, as expected, musical rhythms facilitated motion alignment between interaction partners. However, unexpectedly, this was accompanied by the lowest conversation pleasantness. Moreover, an analysis of random dyads revealed that high background metricality was associated with a suppression of non-social low-frequency synchronizing, which might normally arise from interaction-unspecific aspects of the situation (e.g., the presence of another person, typical motion patterns associated with talking).
Together, these findings conflict with the idea that musical rhythms promote positive interaction outcomes. Although musical rhythms prompted listeners to move along and enhanced temporal coordination with the interaction partner, they also engendered internally conflicting motion rhythms and negative rather than positive affect. One possible reason for this may be that musically aligned motion conflicts with the temporal dynamics of human dialog. Indeed, recent perspectives on interactional synchrony emphasize the importance of both temporally coordinated and uncoordinated activity (Mayo & Gordon, 2020; Schirmer, Fairhurst, et al., 2021). Accordingly, interactional success depends on the balance that individuals strike between moving with the partner and withdrawing from synchrony. Moreover, necessary temporal flexibility might be compromised in the presence of a fixed external structure as inherent in a musical rhythm. As a consequence, the interaction may not feel natural and/or trigger compensatory mechanisms that are mentally costly and increase overall conversation effort. This in turn might impair the affective evaluation of what transpired.
Preliminary support for these possibilities comes from a follow-up study (Supplemental Materials) in which participants, undergoing the same manipulation as in the primary study, rated their perceived level of synchrony and distraction. Interestingly, although actual synchrony was highest, subjective synchrony was lowest when metricality and regularity were high suggesting that participants may have felt “off” when coordinating with their partners under such constrained conditions. Additionally, conversations on a highly metrical and regular background but also on other backgrounds felt more distracting than conversations held in silence.
Metricality, Regularity and the Benefit of Non-Musical Sounds
Existing research examining the synchronizing effect of musical rhythms has emphasized their metricality and paid scant attention to regularity (for a review see Hoehl et al., 2021). However, like metricality, regularity characterizes musical pieces and is ever more prominent the more popular a given piece. Indeed, repeating the same temporal structure as is typical for verses and refrains helps reinforce a song’s beat. Among others this is evident from the speech-to-song illusion whereby the repetition of a spoken phrase creates the perception of the speaker singing (Deutsch et al., 2011). Additionally, it is implied by recent evidence generalizing this effect to other types of sounds (e.g., dripping water) and showing that it persists when melodic information is artificially removed (Rowland et al., 2019).
Corroborating and extending this work, we found that regularity, alongside metricality, influenced participant motion. Although metricality effects were more pervasive, regularity marginally enhanced motion alignment in the background’s dominant frequencies and thus tended to facilitate the basic motor entrainment effect described above. Additionally, it interacted with metricality in shaping interactional synchrony more broadly in the low frequency range. Synchrony was maximal when both metricality and regularity were high or when both were low.
While the former result was expected, the latter came as a surprise and implies that non-musical rhythms may also facilitate temporal alignment between interaction partners. Importantly, we found this happened without significant affective costs. Previous research suggests that auditory backgrounds are arousing and benefit overall engagement with a task (Escoffier et al., 2010; Husain et al., 2002). Thus, in the present study, rhythms with low metricality and low regularity might have stimulated individuals to engage in conversation, while interfering less than a musical rhythm with the interaction’s natural temporal dynamics (Mayo & Gordon, 2020; Schirmer, Fairhurst, et al., 2021).
Caveats and Future Directions
The synchronizing of behaviors and minds in social settings is a complex phenomenon that depends on many aspects both external and internal to the interacting individuals. In order to shed light on synchronizing processes and functions, we must rely on well-controlled experiments. By manipulating one variable while keeping other factors constant we gain insight into how this variable, in our case musical rhythm, shapes synchronizing. Moreover, by integrating different experimental findings, we acquire a mechanistic understanding allowing us to predict how synchronizing and interactions more generally unfold as a function of their context.
Nevertheless, a psychological phenomenon cannot be reduced to laboratory data. Indeed, after careful experimentation it must be pursued in more and more complex settings and, eventually, in real life. Thus, it is important that future efforts address not one but multiple internal and external variables of synchronizing. For example, one may wish to ascertain whether rhythmic background effects compare when individuals are friends or strangers (Fujiwara et al., 2020). Additionally, one may aim to examine synchronizing to more musical background stimuli with melodic and harmonic elements added to the rhythm. Last, it would be important to study musically modulated synchronizing in the kind of casual settings in which such synchronizing typically occurs.
Notably, the casual consumption of music is a fairly recent phenomenon although music likely predated or accompanied the evolution of speech (Mithen, 2005; Montagu, 2017).
Concluding Remarks
To conclude, this study explored whether and how musical background rhythms shape unstructured human dialog. Although done in a prompted manner, with a transparent video set-up, and a non-melodic stimulus, results shed light on how musical rhythms organize interactional synchrony and emergent affect. Moreover, they highlight that although musical rhythms facilitate interactional synchrony, such facilitation comes at a cost.
Superficially, these data conflict with previous work linking interactional synchrony with positive affect (Tschacher et al., 2014) and underscoring the social benefits of music (Launay et al., 2016; Tarr et al., 2014). However, previous work has focused on spontaneous human dialog in the absence of an external temporal structure (e.g., Fujiwara & Daibo, 2016) or examined music in the context of a shared motion or expressive goal (e.g., Weinstein et al., 2016). To the best of our knowledge, ours is the first attempt to combine these lines of research. The results imply that musical rhythms interfere with the spontaneous time course of conversations and that silence or non-musical sound sequences are more beneficial in fostering dyadic exchange.
Supplemental Material
sj-doc-1-crx-10.1177_00936502211015900 – Supplemental material for When the Music’s No Good: Rhythms Prompt Interactional Synchrony But Impair Affective Communication Outcomes
Supplemental material, sj-doc-1-crx-10.1177_00936502211015900 for When the Music’s No Good: Rhythms Prompt Interactional Synchrony But Impair Affective Communication Outcomes by Annett Schirmer, Clive Lo and Maria Wijaya in Communication Research
Footnotes
Acknowledgements
We thank Jenni Vourinen for help with data collection and Trevor Penney for comments on an earlier draft of the manuscript. This work was supported by a GRF grant awarded by the Hong Kong Research Grants Council to Annett Schirmer (14612318).
Data Availability
The authors are willing to share some of the data (the video recordings, because they identify our participants, are confidential and cannot be shared) of this research upon reasonable request. For requests, please email the corresponding author.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a GRF grant awarded by the Hong Kong Research Grants Council to Annett Schirmer (14612318).
Supplemental Material
Supplemental material for this article is available online.
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
