## Abstract

While a number of theories have been advanced to account for why musical consonance is related to simple frequency ratios, as yet there is no completely satisfying explanation. Here, we explore the theory of synchronization properties of ensembles of coupled neural oscillators to demonstrate why simple frequency ratios may have achieved a special status and why they are important for auditory perception. The analysis shows that the mode-locked states ordering give precisely the standard ordering of consonance as often listed in Western music theory. Our results thus indicate the importance of neural synchrony in musical perception.

## 1. Introduction

For more than two millennia musicians and theorists have debated those factors that tend to give rise to the perception of musical consonance and dissonance (Helmholtz 1877; Plomp & Levelt 1965; Roederer 1975; Tenney 1988; Hartmann 1998). Although there is no single musical definition, consonance is usually referred to as the pleasant, ‘stable’ sound sensation produced by certain combinations of two tones played simultaneously. By contrast, dissonance is the unpleasant grating sound heard with other sound combinations. The common octave, for example, is judged as consonant, while playing two adjacent keys on the piano together (i.e. a semitone) is perceived as dissonant (see electronic supplementary material). The dominating theory to explain these sensations is attributed to Pythagoras and suggests that the simpler the frequency ratio between two tones, the more consonant they will be perceived; the sonority being reflected in the resulting interval's ‘pleasantness’ (figure 1). Consider two pure tones having frequencies *f*_{1}=*P* and *f*_{2}=*Q*. According to the Pythagorean view, the consonance of the two tones may be ordered by the simplicity of their relative integer frequency ratio *P* : *Q* (Roederer 1975; Tenney 1988). Simple integer ratios, argued Galileo, being ‘commensurable in number, so as not to keep the ear drum in perpetual torment’ (Tenney 1988). Thus, the consonant octave is characterized by a 1 : 2 frequency ratio between two tones, while the dissonant semitone is characterized by a 15 : 16 ratio. In Western culture, the intervals are often listed in the decreasing order of ‘perfection’ shown in table 1.

Preference for musical intervals of simple frequency ratios such as the octave, fifth and fourth, might simply reflect education or immersion and exposure to Western musical practices. Cross-cultural examinations of scale structure in music shows that there is a high preponderance of fifths (2 : 3), fourths (3 : 4) and octaves (1 : 2; Schellenberg & Trehub 1994*b*). Moreover, it is well known that the simplicity of frequency ratios has played a central role in musical theories of intervallic consonance and dissonance (Helmholtz 1877; Tenney 1988). It has thus become a common view that musical consonance is, to a possibly large extent, learnt through exposure to musical culture. The learning process might thus be chiefly responsible for the special status of tones related by simple frequency ratios. By contrast, Schellenberg & Trehub (1994*a*,*b*, 1996*a*,*b*) attempted to explore the possibility that the special perceptual status of intervals with simple frequency ratios derives from a natural or inherent biological basis. This was achieved by evaluating infants' ability to detect subtle changes to patterns of simultaneous and sequential pure tones. Their results confirmed that simple, as opposed to complex, frequency ratios are more readily identified by listeners and consequently are more likely to result in a stable perceptual representation. As this was true even for infants, the perceptual status of these special intervals is unlikely to be due to education or exposure to Western musical practices.

## 2. HELMHOLTZ'S THEORY OF BEATING HARMONICS

A scientific basis for the phenomenon of consonance and dissonance was established by Helmholtz (1877) and was based on the number and strength of ‘beating’ harmonics in a pair of simultaneous complex tones (Roederer 1975; Hartmann 1998). Helmholtz argued that for two complex tones in unison (*P* : *Q*=1 : 1) or an octave apart (*P* : *Q*=1 : 2), all harmonics of the second tone are aligned and coincident in frequency with the first, and thus the outcome is highly consonant. However, as the frequency ratio *P* : *Q* becomes more ‘complicated’, the two tones share fewer common harmonics, while there is an increase of harmonic pairs slightly mismatched in frequency. According to Helmholtz's (1877) linear theory, these latter nearby harmonics interact and lead to an unpleasant beating sensation that results in dissonance.

The beating effect may be understood mathematically by considering the linear addition of two pure sine tones (i.e. with no harmonics) having almost the same frequencies *ω*_{1} and *ω*_{2}=*ω*_{1}+*δ*, both of the same amplitude. Summing these signals linearly gives(2.1)where the average frequency . Thus, a listener will not have the impression of listening to two different frequencies but instead will hear a single pure tone with a pitch corresponding to the average frequency and with loudness that varies slowly leaving a *beating* sensation oscillating with an envelope at frequency *δ*=*ω*_{2}−*ω*_{1}. The beating disappears only after surpassing a sufficiently large frequency difference, at least *δ*>15 Hz (see Roederer 1975, p. 28). All signs of roughness disappear when the frequency difference surpasses ‘the critical bandwidth’, which is approximately 10–20% of the centre frequency for frequencies greater than 500 Hz, and pure tones sound both ‘smooth’ and ‘pleasing’ (Plomp & Levelt 1965; Roederer 1975, p. 28).

## 3. PROBLEMS WITH HELMHOLTZ'S THEORY

Helmholtz's (1877) theory is scientifically appealing, but yet it remains controversial and fails to explain a number of non-trivial aspects central to musical psychoacoustics.

Plomp & Levelt (1965) have demonstrated that once the frequency difference

*δ*between two pure-tone intervals exceeds 3 semitones (i.e. beyond the critical bandwidth), no roughness can be experienced by the ear. However, beyond this critical bandwidth the evaluation of consonance can vary considerably and change direction (with peaks and valleys) as*δ*increases. Yet, these changes of consonance occur despite the absence of harmonics, and thus in a regime where beats should be entirely absent. Clearly, Helmholtz's theory of beats is unable to explain these consonance sensations.When applying sequential pure tones that do not enter the ear simultaneously, Helmholtz's theory would no longer seem applicable. Nevertheless, sequential pure-tone intervals with simple (as opposed to complex) frequency ratios were found to be more ‘readily processed by listeners’ (Schellenberg & Trehub 1996

*a*). Here, ease of processing a tone pattern referred to enhanced discrimination of that pattern in experiments. This suggests a special perceptual status for intervals with simple frequency ratios.Experimental studies have shown that patients with auditory cortex lesions lack the ability to evaluate consonance in a similar manner to normal patients (Peretz

*et al*. 2001; Tramo*et al*. 2001). This raises the question as to whether the source of musical perception is governed by peripheral mechanisms in the inner ear as held by Helmholtz. Rather it suggests the existence of specific neural pathways that are devoted to dissonance computation and that can be disrupted selectively by brain damage (Tramo*et al*. 2001).The EEG responses of subjects to pairs of pure tones show that neural processing of consonance depends on higher associative processing of pitch relationships in the cerebral cortex (Itoh

*et al*. 2003). That is, consonance is not just the absence of roughness but determined by neural processing in the auditory cortex. Itoh*et al*. (2003) reached this conclusion by studying the auditory evoked potentials indicative of cortical activity response. Of the intervals studied (1, 4, 6, 7, 9 semitones), they found that in all cases the evoked potentials were at their highest (in terms of voltage) for two pure tones separated by a perfect fifth (7 semitones) when compared with other intervals. These results provide electrophysiological evidence that matches behavioural preference for simple frequency ratios. Given that pure tones only were made use of in the experiments, this preference has nothing to do with the beating of harmonics which forms the basis of Helmholtz's theory (1877).

## 4. COUPLED OSCILLATOR MODEL OF AUDITORY PERCEPTION

We are thus led to ask, over and above Helmholtz's beating phenomena, why do some combinations of tones sound more pleasant than others? The answer to this question may well have to do with the nonlinear dynamics of auditory perception, in contrast to Helmholtz's solely linear framework. Consider then, two coupled ‘integrate and fire’ neural oscillators that in the absence of coupling have distinct frequencies *ω*_{1} and *ω*_{2} and a relative frequency ratio *Ω*=*ω*_{1}/*ω*_{2}. Each oscillator might typically represent a neuron, or a population of neurons. Such signals are processed in the auditory cortex within the right superior temporal gyrus that is believed to be involved in the analysis of pitch and timbre (Samson & Zatorre 1994; Zatorre *et al*. 1994; Blood *et al*. 1999). In response to a specific auditory tone frequency stimulating the cochlea, such an oscillator would fire at a given frequency. For modelling simplicity, firing frequencies may be the same as the driving frequencies, but in reality may be scaled-down versions of them, since neurons cannot fire at rates much beyond a kilohertz.

A simple scheme of two mutually coupled oscillators that captures the generic behaviour consists of two voltage variables *x*_{1}, *x*_{2} as follows (Coombes & Lord 1997):(4.1)Here *τ*_{1}, *τ*_{2} are decay constants; *E*_{1}(*t*) represents the effect of neuron-2 on neuron-1 and vice versa; *I*_{1} and *I*_{2} represent the external input that *x*_{1} and *x*_{2} receive, respectively; and *ϵ* represents the strength of coupling between the neurons.

The first oscillator (*x*_{1}) increases in voltage and ‘fires’ only when it reaches a fixed threshold (*x*_{1}=1). After firing, the oscillator is instantaneously reset to zero (*x*_{1}=0), while the voltage of the second oscillator is instantaneously increased by *ϵE*_{2}(*t*), i.e.The strength of coupling between the oscillators is thus determined by *ϵ*. One of the simplest coupling schemes assumes that communication between the neurons is via a sharp infinitesimal pulse, such as the Dirac *δ*-function (Mirollo & Strogatz 1990),where denotes the *j*th firing time of oscillator-1. The firing of neuron *x*_{1} thus results in an increase by an amount *ϵ* in the voltage of oscillator-2.

The simple Dirac *δ*-function pulse is only a first approximation. In reality, the effective input to the neuron has a longer temporal duration due to the synaptic transmission process. One particular pulse shape that approximates the rise and fall time of real synaptic currents in a realistic fashion is of the following form (Jack *et al*. 1975):(4.2)Here *α*(*t*) represents the exponential rise (and fall) of the synapse of *x* as shown in figure 2, and *Θ*(*t*) is a step function such that

The maximal synaptic response occurs at a time *α*^{−1} after the arrival of an action potential (Coombes & Lord 1997). In practice, the final input to the neuron is a sum of distributed delays represented by alpha functions, which gives (Coombes & Lord 1997)(4.3)The above formula makes allowance for the fact that the voltage of the oscillator is increased by an amount calculated over the weighted sum of all past firings of its neighbouring coupled oscillator. (For *α*→∞, the simple case of coupling via a *δ*-function is retrieved.)

The frequency *ω*_{i} of oscillator-*i* when uncoupled is found by solving the differential equationto obtain , given the initial conditions *x*=0 at *t*=0. Note that one firing occurs in the time frame , where *x*=0 changes to *x*=1 at . The period of the oscillator's firing cycle can be calculated by inserting *x*=1 in the above equation giving . Thus, the natural firing frequency of the oscillator when uncoupled is

## 5. SYNCHRONIZATION AND SIMPLE FREQUENCY RATIOS

By virtue of the coupling, the two oscillators are able to synchronize or ‘mode lock’ (Schuster 1995; Coombes & Lord 1997) so that their firing patterns repeat with the same fixed period. Figure 3 shows time series of the two oscillators in a 2 : 3 mode-locked state. To understand the subtleties of mode locking in more detail, one needs to compare the ratio of the observed oscillator frequencies when coupled *Δ*_{1}/*Δ*_{2} to the ratio of the oscillator's natural intrinsic frequencies *Ω*=*ω*_{1}/*ω*_{2}. The oscillators tend to mode lock to a simple firing ratio *P* : *Q*=*Δ*_{1}/*Δ*_{2} which is close but not necessarily equal to the ratio of the oscillator's intrinsic frequencies *Ω*=*ω*_{1}/*ω*_{2}. The beauty of the synchronization is that the mode-locked state (e.g. 1 : 2) is stable to small changes in the frequencies *ω*_{1} or *ω*_{2} and thus *Ω*. In practice this means that should the intrinsic frequencies of the oscillators change slightly, the system's synchronized solution will nevertheless remain unaffected. This is demonstrated graphically in figure 4, where *Ω*=*ω*_{1}/*ω*_{2} is varied; yet there are horizontal plateaus where the system's synchronized solution *P* : *Q*=*Δ*_{1}/*Δ*_{2} stays unchanged.

Figure 4 gives simulation results showing the width of the interval Δ*Ω* for which the ratio *Ω*=*ω*_{1}/*ω*_{2} may be changed while the mode-locked state *P* : *Q* remains constant. The vertical axis in figure 4 corresponds to the ratio of the observed frequencies of the coupled oscillators, namely *P* : *Q*=*Δ*_{1}/*Δ*_{2}, while the horizontal axis corresponds to the ratio *Ω*=*ω*_{1}/*ω*_{2}. The stability interval of 1 : 1 is marked by Δ*Ω*_{1}, 1 : 2 by Δ*Ω*_{2} and 2 : 3 by Δ*Ω*_{3}. The complete set of mode-locked states is referred to as a Devil's staircase (Schuster 1995) and is a universal feature of driven coupled oscillators. Note that the width of the mode-locked interval Δ*Ω* should be considered as an indicator of the structural stability of the synchronization. The wider the interval, the stronger the structural stability. Thus, for example, the unison (1 : 1) might be considered a more stable synchronization than the octave (1 : 2) since Δ*Ω*_{1}>Δ*Ω*_{2}. This correspondence between musical intervals and mode-locked states was previously sketched out in Stone (2000).

Table 1 shows a more detailed summary of the ordering of the stability index of the mode-locked states and reveals a correspondence with the theoretical ordering of musical intervals according to their consonance evaluation. The ordering corresponds to ratio simplicity discussed in Schellenberg & Trehub (1994*b*), where the simplest ratios (e.g. 1 : 1, 1 : 2, 2 : 3) are the most consonant. The ordering corresponds to that given in Helmholtz (1877, pp. 183 and 194) and Roederer (1975, p. 141, table 5.2) who regard it as having been accepted in the Western musical culture.

Theoretical arguments from a study of the generic ‘circle map’ also lead us to expect the relationship between the simplicity of the frequency ratio *P* : *Q*, and the width of the stability interval Δ*Ω* (Cvitanović *et al*. 1985). The relationship has been connected to a mathematical construct, the ‘Farey tree’, which orders all rational fractions *P*/*Q* in the interval [0,1] according to their increasing denominators *Q* (Cvitanović *et al*. 1985). As the circle map is a paradigmatic model for a large class of coupled oscillators the ordering of intervals by the stability index should be considered parameter independent in general.

## 6. Discussion

It should be noted that there may be more than one neural source that contributes to our perception of consonance and dissonance. Neural processing of auditory stimuli is complex, and it is possible that some combination of physical properties at the ear, primary auditory processing and secondary or associative processing play a role in this perception. Synchrony effects underlying these layers of complexity nevertheless may hold important clues in any attempt to explain consonance. Indeed, Cartwright *et al*. (2001) have explored a similar dynamical systems approach whereby the synchrony characteristics of three coupled oscillators (three frequency resonances), may resolve the puzzling perception of the ‘missing fundamental’. Their theory accounts for the manner in which a fundamental is mysteriously perceived in a set of tones played simultaneously, even though it is absent.

Having presented a theory of consonance and dissonance, it is important to emphasize that the effects we describe are intended to deal solely with pure-tone intervals outside of any musical context. This is to deliberately exclude the emotional component that is evoked when listening to harmonic musical progressions. Thus, the jazz musician might love hearing dissonance in music, but this phenomenon falls outside the scope of the theory presented here.

A selection of examples of consonant and dissonant sounds may be found in the electronic supplementary material.

Although Helmholtz's theory of beating harmonics is a delightful explanation for consonance and dissonance perception, as shown above, it nevertheless fails to account for many phenomena well known in the literature. In such cases, other explanations are needed. Partly owing to this, neural synchrony has in the past been postulated as an important mechanism in auditory perception (Boomsliter & Creel 1961; Palisca & Moore 2001). Palisca & Moore (2001) justify their ‘explanation in terms of the synchrony of neural impulses … [since it] is supported by the observation that both our sense of musical pitch and our ability to make octave matches largely disappear above 5 kHz, the frequency at which neural synchrony no longer appears to operate’ (Palisca & Moore 2001). The model presented here serves to extend their argument since it explains why human preference for simple frequency ratios in pure tones may be a natural consequence of neural synchronization.

## 7. Glossary

*Pure tone* is a single frequency tone with no harmonic content (no overtones). This corresponds to a sine wave. It is characterized by the *frequency*—the number of cycles per second and the *amplitude* of the cycles.

*Complex tone* is a combination of the fundamental frequency tone together with its harmonic components (its overtones). For a sine wave, the harmonics are integer multiples of the fundamental frequency of the wave. For example, if the fundamental frequency is *f*, the harmonics have frequency 2*f*, 3*f*, 4*f*, etc. Sounds produced from musical instruments are complex tones.

*Pitch*. A pitch is the perceived fundamental frequency of a tone.

*Interval*. In music theory, the term interval describes the difference in pitch between the fundamental frequencies of two notes. Intervals may be labelled according to the ratio of frequencies of the two pitches. Important intervals are those using the lowest integers, such as 1 : 1 (unison), 1 : 2 (octave), 2 : 3 (perfect fifth), 3 : 4 (perfect fourth), etc. as shown in table 1.

The ‘just intonation’ tuning (in which the frequencies of notes are related by ratios of integers) is the basic scaling method, but due to practical implementation difficulties on some musical instruments, the ‘equal temperament’ tuning was introduced (in which the octave (1 : 2) is divided into a series of equal steps).

*Sonority* is a term that refers to the quality of a musical tone. In particular, it refers to the resonance, richness or fullness of tone.

## Acknowledgments

We thank David Earn for encouraging us over many years to finalize this work. We thank Bernd Blasius for initial simulations of an integrate–fire neuron model, and the helpful comments of four referees. We acknowledge the generous support of the James S. McDonnell Foundation and the Adams Super Center for Brain Studies.

## Footnotes

- Received April 14, 2008.
- Accepted May 22, 2008.

- © 2008 The Royal Society