Abstract
Introduction
Most of us can easily track the beat of rhythmic auditory events, such as music, and move along with it. This can be seen when we synchronize our movement to the rhythm of music while dancing, marching, doing sport activities (e.g., jogging to the beat of music). This ability is widespread in the general healthy population (Repp, 2010; Sowiński & Dalla Bella, 2013), with just a few exceptions (e.g., beat deafness, Bégel et al., 2017; Launay, Grube, & Stewart, 2014; Palmer, Lidji, & Peretz, 2014; Phillips-Silver et al., 2011; Sowiński & Dalla Bella, 2013). Moving to musical rhythm implies that listeners can extract the beat from an auditory sequence. The beat is defined as a perceived pulse that marks equally spaced points in time (Large & Jones, 1999; London, 2012), to which we usually move when we tap our finger/foot or in dance. Beat tracking can be tested in purely perceptual tasks (e.g., detecting a deviation from isochrony in a sequence of tones, Ehrlé & Samson, 2005) or in sensorimotor tasks (e.g., paced tapping to the sounds of a metronome or to music, Repp, 2005; Repp & Su, 2013). In the past few years, batteries including both perceptual and sensorimotor tasks have been developed for the evaluation of rhythmic and timing abilities, such as the Battery for the Assessment of Auditory Sensorimotor and Timing Abilities (BAASTA; Dalla Bella, Farrugia, Benoit et al., 2017) and the Harvard Beat Assessment Test (H-BAT; Fuji & Schlaug, 2013). These batteries are highly valuable as they allow the characterization of the timing capacities of distinct populations and the highlighting of inter-individual differences (Bégel et al., 2017; Benoit et al., 2014; Cochen De Cock et al., 2018; Dalla Bella, Benoit, Farrugia et al., 2017; Dalla Bella, Dotov, Bardy, & Cochen de Cock, 2018; Dalla Bella, Farrugia, Benoit et al., 2017; Falk, Müller, & Dalla Bella, 2015; Puyjarinet et al., 2017).
Rhythmic skills are sustained by a complex neuronal network. Notably, even in the absence of a motor response, mere extraction of the beat from an auditory signal recruits motor regions of the brain, such as the basal ganglia, premotor cortex, pre-SMA, and the cerebellum (Chen, Penhune, & Zatorre, 2008a; Coull, Cheng, & Meck, 2011; Grahn & Brett, 2007; Grahn & Rowe, 2009), on top of perceptual regions (superior temporal gyrus; Chen, Penhune, & Zatorre, 2008b; Schwartze & Kotz, 2013; Thaut, 2003). When a motor response is coupled to an auditory rhythm this network extends to sensorimotor integration areas (e.g., dorsal premotor cortex; Chen, Zatorre, & Penhune, 2006; Coull, Cheng, & Meck, 2011; Zatorre, Chen, & Penhune, 2007). Malfunctioning of these networks typically affects rhythmic skills in neurodegenerative disorders (e.g., Parkinson’s disease; Benoit et al., 2014; Grahn & Brett, 2009; Jones & Jahanshahi, 2014; Pastor, Artieda, Jahanshahi, & Obeso, 1992; Spencer & Ivry, 2005) or neurodevelopmental deficits (ADHD, Noreika, Falter, & Rubia, 2013; Puyjarinet et al., 2017; stuttering, Falk, Müller, & Dalla Bella, 2015; autism spectrum disorder, Allman, Pelphrey, & Meck, 2012; speech and language impairments, Corriveau & Goswami, 2011; Corriveau, Pasquini & Goswami, 2007; Dalla Bella, Dotov, Bardy, & Cochen de Cock, 2018; Goswami, 2011; Huss, Verney, Fosker, Mead, & Goswami, 2011). Timing and rhythmic skills can also be selectively deficient in healthy adults (beat deafness, Bégel et al., 2017; Launay, Grube, & Stewart, 2014; Palmer, Lidji, & Peretz, 2014; Phillips-Silver et al., 2011; Sowiński & Dalla Bella, 2013; tone deafness, Dalla Bella & Peretz, 2003; Dalla Bella, Berkowska, & Sowiński, 2015). Interestingly, the ability to track the beat has been associated with other cognitive abilities such as working memory, sustained attention, or language and reading skills in children (Tierney & Kraus, 2013; Woodruff Carr, White-Schwoch, Tierney, Strait, & Kraus, 2014).
Altogether these studies indicate that there is a tight link between rhythmic skills, motor, and cognitive functions. Because of that link, one may expect that an improvement in rhythmic skills may positively affect both motor and cognitive functioning. Rhythmic training may provide a viable strategy to improve other functions above and beyond rhythm. This possibility finds some confirmation in studies showing the beneficial effect of rhythmic stimulation on motor functions and cognition. For example, rhythmic training in which patients with movement disorders such as patients with Parkinson’s disease walk together with a metronome or music improves their gait, by increasing speed and stride length (Dalla Bella, Benoit, Farrugia et al., 2017; de Dreu, Van Der Wilk, Poppe, Kwakkel, & Van Wegen, 2012; Thaut et al., 1996; Spaulding et al., 2013) and reduces their deficits in rhythm perception and production (Benoit et al., 2014; Dalla Bella, Benoit, Farrugia et al., 2017). Notably, the positive response to rhythmic stimulation depends on patient’s perceptual and sensorimotor rhythmic skills, thus pointing to a strong link between rhythm processing and motor control (Dalla Bella, Benoit, Farrugia et al., 2017; Dalla Bella, Dotov, Bardy et al., 2018; Cochen de Cock et al., 2018). In addition, rhythmic stimulation (e.g., rhythmic priming) can be used for improving speech perception in children with dyslexia, and with specific language impairment (e.g., Przybylski et al., 2013; Schön & Tillmann, 2015).
In sum, training rhythmic skills appears to be a promising avenue for improving movement and cognition in a variety of populations. To the best of our knowledge, no systematic protocol for training selectively rhythmic skills has been proposed so far. The goal of this study was to devise and test a new rhythm training protocol which is implemented as a serious game exploiting new mobile technologies. A serious game is a game designed specifically for training and education purposes, such as providing a dedicated training for rehabilitation/remediation, in an entertaining and motivating fashion, while being widely accessible to the targeted public and remaining low-cost (Annetta, 2010; Kato, 2012). Over the past two decades, serious games have been extensively used in therapy (for a review, see Rego, Moreira, & Reis, 2010). Several studies proved that serious games involving motor exercises have beneficial effects on movement capacities in stroke (Friedmann et al., 2014; Webster & Celik, 2014), in Parkinson’s disease (Barry, Galna, & Rochester, 2014; Harris, Rantalainen, Muthalib, Johnson, & Teo, 2015; Mendes et al., 2012), and in healthy older adults (Sun & Lee, 2013). Dedicated cognitive training, such as working memory or executive function training, via serious games has also yielded encouraging results over the past 10 years (e.g., Anguera et al., 2010; for review, see Lumsden, Edwards, Lawrence, Coyle, & Munafò, 2016; however, for a discussion on the limit of computerized cognitive training, see Owen et al., 2010).
There are a few examples of rhythmic games in the market, such as Guitar Hero® or Rhythm Heaven Fever®. Unfortunately, none of these games is specifically dedicated to rhythmic training or complies with the measurement standards needed for experimental work, as we highlighted in a recent survey (Bégel, Di Loreto, Seilles, & Dalla Bella, 2017). First, measures of rhythmic motor performance lack temporal precision in the presentation of stimuli and/or data acquisition. The output data is insufficient since there is no measure of rhythm performance recorded. Second, there is usually no progression in the games based on the rhythmic features of the musical stimuli. Therefore, these games do not train selectively rhythmic skills. Finally, most of the games may be played without relying on auditory information. Visual cues displayed on the screen, such as images appearing rhythmically, are often sufficient to play the game, as the goal is to execute a movement in a given temporal window in reaction to these cues. For instance, in Guitar Hero, the player has to synchronize with moving circles when they reach a given position on the screen. These drawbacks make off-the-shelf rhythm games unsuitable for selective training of rhythmic skills.
Here we devised a new serious game for training perceptual and sensorimotor rhythmic skills, named
The study consists of two experiments. The goal of the first experiment is to select and validate the musical material. In particular, the experiment served to rank the musical stimuli from high to low beat saliency, tested with a tapping task. Beat saliency is critical for creating difficulty levels in the game. Players in both the perception and the tapping versions of the game need to extract the beat in order to complete the tasks with success. This is more difficult when the beat has low saliency than when the beat is very salient. When the beat is less salient, beat extraction will particularly recruit mechanisms devoted to the internal generation of the beat (Grahn & Rowe, 2009). The second experiment is a proof-of-concept pilot study, with the goal of testing usability of
Experiment 1
The goal of Experiment 1 is to select the musical excerpts for
Participants
Eighteen participants (5 females, mean age = 26,
Material and procedure
An initial set of 90 musical excerpts available in MIDI format in an online music repository (www.midiworld.com) was selected from three musical genres, 30 from classical music, 30 from jazz, and 30 from pop music. The choice of musical stimuli across different genres affords variety among the excerpts in terms of beat saliency. In addition, it has the advantage of making the game less monotonous and more attractive for players regardless of their musical preferences. For stimuli in which vocal performance was part of the excerpt, the voice was replaced by a melody with a piano timbre. All excerpts were rated by four members of the laboratory (two musicians), experts in timing and rhythm research, in terms of beat saliency, pleasantness, and familiarity. Beat saliency was rated on a seven-point scale (1 = the beat can be hardly perceived, 7 = the beat can be easily perceived). Similar scales were used to rate stimulus pleasantness (1 = not pleasant, 7 = very pleasant) and familiarity (1 = not familiar, 7 = very familiar). The stimuli were ranked based on the ratings of beat saliency, and assigned to three categories including 30 excerpts each: with (1) a highly salient beat (ratings between 5.75 and 6.75), (2) an averagely salient beat (ratings between 4.5 and 5.5), and (3) a beat with low saliency (ratings between 1.5 and 4). Within each category the five least pleasant excerpts were discarded, leading to 75 excerpts, 25 in each category. Rating scores for pleasantness, familiarity and beat saliency of the excerpts in the three categories were compared with a one-way repeated measures Analysis of Variance (ANOVA). The three stimulus categories significantly differed in terms of average beat saliency (= 2.76 for stimuli with low beat saliency, 5 for stimuli with average beat saliency, and 6.31 for stimuli with high beat saliency;
Analysis
Motor synchronization to the beat was analyzed with circular statistics (Fisher, 1993; for examples with tapping, see Dalla Bella & Sowiński, 2015; Kirschner & Tomasello, 2009). This method consists of representing the inter-beat interval (IBI) of the stimuli on a 360° polar scale. The timing of each finger tap relative to the beat is represented by an angle by comparing the time of the tap to the time of the nearest beat (Figure 1). Angles, treated as unitary vectors, are used to compute the resultant vector

Examples of two distributions of taps corresponding to two musical excerpts. The dots represent the distributions of the timing of the taps relative to the beat (= 0 degrees) for one participant. On the left, the dots occur in the vicinity of the beat, indicating high synchronization consistency (length of vector
Finally, in order to obtain an objective measure of beat saliency, for each excerpt we computed pulse clarity, based on the acoustic signal using the “pulse clarity” function in the MIR toolbox in Matlab (Lartillot & Toiviainen, 2007; Lartillot, Toiviainen, & Eerola, 2008). Large values in terms of pulse clarity indicate that the beat is particularly salient.
Results and discussion
Twenty-two musical excerpts were rated below 4 in terms of pleasantness, and were thereby discarded. The final set of 53 musical excerpts 1 were ranked from the easiest to the most difficult to synchronize with, based on synchronization consistency.
Musical excerpts ranked based on synchronization consistency, treated as an indicator of rhythmic difficulty, are presented in Table 1. It is worth noting that synchronization consistency is positively correlated with pulse clarity (
Description of musical excerpts. Vector length (between 0 and 1) represents synchronization
This set of 53 musical excerpts ranked for rhythmic difficulty was the musical database used to design a rhythm training protocol implemented in a serious game (
Experiment 2
This experiment is a proof-of-concept pilot study of the game
Participants
Thirty healthy young adults participated in the experiment (8 females, mean age = 24.67,
Experimental design
Training protocol: Rhythm Workers
Stimulus material
In addition to the musical excerpts selected in Experiment 1, nine metronome sequences (isochronous sequences of tones) and 37 rhythmic sequences were created. The metronome sequences are formed by 80 isochronously presented tones. Rhythmic sequences are temporal patterns of tones with different durations and with an underlying beat. There were 18 strongly metrical sequences and 191 weakly metrical sequences defined based on the classification of Povel and Essens (1985; see also Patel, Iversen, Chen, & Repp, 2005) (see Table 2). The beat underlying strongly metrical sequences is typically easier to track than in weakly metrical sequences (Patel et al., 2005). In both metronome and rhythmic sequences, the timbre of the tones was a woodblock percussion sound.
Metrical sequences. Tempos of the sequences were manipulated as follows: the first two sequences were presented with a tempo of 100 beats per minute (BPM; IBI = 600 ms). The following sequences’ tempos were either progressively reduced or increased by 10% of the original BPM value in steps of 10%.
x = event onset.
. = silent position.
| = indicates that the following event or silent position is associated with a beat.
To vary rhythmic difficulty, the beat rate of metronome sequences and rhythmic patterns was manipulated. An IBI of 600 ms corresponds to the natural rate at which on average individuals tap in the absence of a pacing stimulus (Repp, 2005; Repp & Su, 2013). Stimuli with this beat rate are thus considered as the easiest to tap along with. Difficulty was manipulated by progressively deviating from this optimal rate. We created sequences with IBIs which are 10%, 20%, 30% and 40% faster or slower than 600 ms for metronome sequences (IBI range: 360–840 ms), and 10%, 20%, 30%, 40% and 50% faster or slower than 600 ms for rhythmic sequences (range: 360–900 ms). Two strongly and weakly metrical sequences were created for each tempo. 1 In order to make the game less monotonous, faster and slower stimuli were interleaved.
Rhythm Workers
The goal of the game is to construct buildings. The construction of a building is associated to one of the stimuli presented above (metronome, rhythmic sequence, or music; each stimulus includes 80 beats) and corresponds to one level of the game. Ninety-nine levels were designed. These levels were grouped into nine degrees of difficulty, referred to as “worlds” (11 levels per world), as illustrated in Table 3. Ninety-nine stimuli were selected (53 musical excerpts, 9 metronome sequences, and 37 rhythmic sequences) and assigned to the different worlds, as can be seen in Table 3. To make the game interesting, thus potentially motivating, the three types of stimuli were alternated within the same world as follows: the game started with a metronome sequence and other metronome sequences occurred every 10 levels of the other stimuli. The other 10 stimuli (i.e., music and rhythmic sequences) were presented after each metronome sequence according to the following fixed order: 2 musical excerpts – 2 strongly-metrical sequences – 2 musical excerpts – 2 weakly-metrical sequences – 2 musical excerpts. This structure of the rhythmic training protocol was implemented in two versions of the game. In the perception version, the task is an adaptation of the Beat Alignment Test (BAT, Iversen & Patel, 2008) in which the player is asked to detect if a sequence of percussion sounds (a metronome) is aligned or not to the beat of the stimulus. In the tapping version, the goal of the task is to tap to the beat of the stimulus as accurately as possible.
Structure of the rhythmic training protocol implemented in
M = music excerpt.
SM = strongly metrical sequence.
WM = weakly metrical sequence.
The aesthetic quality of the building depended on the player’s performance (see Figure 2). Feedback about the performance was provided both during the performance of a level in real time, and at the end of the level (after the end of a stimulus). Real-time feedback was provided four times while the participant played a given level in both the perception and the tapping versions of the game. The four iterations correspond to the appearance of the 4 stories of the building. When the player tapped accurately to the beat (

Two examples of buildings generated by one player. From left to right are presented the four steps of the building construction (appearance of each of the four stories of the building). The aesthetic quality of the building depends on the player’s performance. A) Example of a bad performance (score = 10 points, 1 star). B) Example of a very good performance (score = 100, 5 stars).
Additional feedback about the performance was given at the end of each level. This was a final score between 5 and 100 points calculated based on participants’ overall performance in the level (for details, see below) and converted into a number of stars, from one (score < 70) to five (score > 95) stars. A performance leading to at least two stars was sufficient to unblock the next level within the same world. To unblock (and move to) the next world, the player had to gather at least 20 stars in the current world. Note that if the player could not obtain two stars after five trials at the same level, the next level was automatically unblocked. This process allowed a player who had particular difficulties with one level to move to the next level, with the possibility of training on the previous level within the same world later. Finally, if the participant completed all the worlds before the end of the two weeks, the game restarted from world 1, but with a slightly more difficult version of the game, in which a number of three stars, instead of two, was needed to unblock the subsequent level.
Perception version
In the
The task of the player was to judge after each tone sequence whether the tones were aligned or not with the stimulus beat. The player responded by touching one of two buttons (“Yes” and “No”) presented in the middle of the screen (see Figure 3). The buttons appeared for 2.5 s in correspondence with the eighth tone of each sequence. A wrong answer led to the appearance of a story of the building which was not aesthetically appealing and 25 points were subtracted from the final score (= 100 points at the beginning of the game). A correct answer and the player’s reaction time (the faster, the better) determined the final score and whether the story of the building appearing was more or less aesthetically appealing. If the correct response was given in the first half of the response time window (i.e., within 1.25 s), the best version of the building was displayed, and no points were subtracted from the final score. If the correct response was provided later, but within the second half of the response time window (i.e., between 1.25 s and 2.5 s), a less appealing version of the building was displayed and 5 points were subtracted from the final score. Finally, if the player provided a wrong response for all the four tone sequences, a minimum score of 5 points, corresponding to one star, was assigned, to avoid a null score that would be very demotivating for the player. Altogether, 396 triangle tone sequences were judged in the game. Half of them (198) were aligned to the beat of the stimulus. The other half (198) were misaligned. When misaligned, the sequence IBIs presented either a change in period (100 sequences) or in phase (98 sequences) relative to the stimulus IBI. Fifty sequences were presented at a tempo 10% slower, and the other 50 sequences at a tempo 10% faster, than the stimulus IBI. Moreover, tones in 49 sequences anticipated the stimulus beat by 30% of the IBI, while the tones in the other 49 sequences were presented later than the beat by 30% of the IBI. In both cases, the inter-tone interval was the same as in the stimulus.

Examples of the response windows in the perception version of
Tapping version
As in the
Circular statistics computed from the last 15 taps prior to the appearance of a story of the building were used to assess tapping performance in real time. Synchronization
If the score, computed before showing a building’s story, was lower than 70, the worst version of the building was displayed. A good version was presented if the score was between 70 and 90, and a very good one if the score was higher than 90. The versions of the buildings were the same as in the
Assessment of rhythmic skills before and after training
Participants’ rhythmic skills, namely beat perception and sensorimotor synchronization to the beat, were tested before and after the training protocol with two tasks taken from BAASTA (Dalla Bella, Farrugia, Benoit et al., 2017), the BAT and a paced tapping task. In the BAT, 72 musical fragments lasting 20 beats from Bach’s “Badinerie” and Rossini’s “William Tell Overture” were presented at three different tempos, with 450-, 600- and 750-ms IBIs. From the 7th beat, an isochronous sequence (percussion sound) is superimposed onto the music. The sequence was either aligned with the beat of the music or non-aligned (in terms of phase or period). The participant judged whether the percussion sounds were aligned or not with the beat. In the paced tapping task, participants were asked to tap with their index finger to the sounds of a metronome at three tempos (450-, 600- and 750-ms inter-stimulus intervals (ISIs)), and to the beat of music (two excerpts from Bach’s “Badinerie” and Rossini’s “William Tell Overture”; IBIs = 600 ms). Each trial of the paced tapping task was repeated twice.
Procedure
Each participant in the
Results
Usability and compliance
The total time played was 205.9 mins (

Mean scores obtained in the nine worlds for the tapping and the perception groups. Error bars indicate Standard Error of the Mean.
Finally, we compared the results obtained with the three types of stimuli (metronome, metrical sequence, and music) by the participants in the two groups (see Figure 5). Participants in both groups performed better with the metronome sequences than with the metrical sequences (

Mean scores obtained with the three types of stimuli for the tapping and the perception groups. Error bars indicate Standard Error of the Mean.
Motivation
Mean motivation ratings for each session in the two groups are shown in Figure 6. In all the sessions, motivation was much higher than the average value of the scale (3.5), irrespective of the version of the game. We compared the motivation in the 10 sessions for the two groups with a 10 x 2 ANOVA. Session (1 through 10) was the within-subject factor and Group (

Motivation scores across the 10 testing sessions. Error bars indicate Standard Error of the Mean.
Beat perception and tapping
For the BAT, a sensitivity index (
Individual performances (
Improvement: *small, **average, ***large.
Worsening: †small, ††average, †††large.
A significant improvement was found only for the
In the control group, half of the participants improved their performance in the second evaluation session by more than 10% while the other five had only a small improvement. Two participants worsened their performance, while the last three participants remained stable.
Most of the participants in the
The effects of the training were less visible in the
In the tapping tasks, synchronization
Discussion
The goal of this experiment was to pilot a training protocol based on
The
This protocol creates optimal conditions for improving rhythmic skills. First evidence was provided that beat perception tested with the BAT was clearly improved by the
General discussion
The results of this proof-of-concept pilot study show high motivation of the participants when playing the game, which is sustained for the two weeks of the training protocol. This finding is very encouraging because players’ motivation attests that the game is equally engaging across the different worlds, in both the
All the players managed to play and to obtain satisfying scores, in spite of the notable variability in the players’ performances. The serious game and the protocol are sufficiently flexible to adapt to initial individual differences in rhythmic skills, without hindering players’ motivation. Variability in rhythmic skills in individuals without musical training is expected since it reflects the various profiles of rhythmic abilities in the general population (Bégel et al., 2017; Launay, Grube, & Stewart, 2014; Sowiński & Dalla Bella, 2013; Tranchant, Vuvan, & Peretz, 2016). Participants’ scores in the
We first provided evidence that a training protocol using a serious game such as
Against our expectations, we found no significant effect of the
In summary, we devised
