Dissatisfied with the explanations for music's appeal advanced by philosophers, scientists, and poets alike, Purves, along with Schwartz and Catherine Howe Ph.D. '03 (then a postdoctoral fellow at Duke, now a resident in psychiatry), decided to investigate for themselves, using an approach based on biology and evolution. They began by asking how the natural environment might have shaped our musical preferences.
A key aspect of music is that it is tonal, meaning it is made up of regularly repeating or "periodic" sounds. Most periodic sounds in nature are made by living things. "In order to produce periodic sounds, you need a system that has an oscillator coupled to an energy source that is able to sustain vibrations," Schwartz says. "All living things, whether you're talking about insects or frogs or you and me, can produce and control energy."
The most salient periodic sounds humans hear on a consistent basis are the vocalizations we make to communicate with one another by means of speech, Purves says. And speech, like music, is tonal. Furthermore, the tonal quality of speech is specifically associated with the production of vowels that in English of course include the sounds represented by the letters a, e, i, o, and u. That's because our vocal cords only vibrate during vowel production. (In contrast, spoken consonants, which in English include sounds represented by the letters b, f, t, p, and so on, don't require the vocal cords and are thus not tonal.)
"If you think of music as being generated by a string that's plucked on a guitar, that's very similar physically to the vibrations of the vocal cords during vowel production," Purves explains.
From this knowledge, the team hypothesized that speech played a major role in shaping the evolution and development of the human auditory system, including the tonal preferences found in music. The team presented evidence supporting this idea in a 2003 study published in the Journal of Neuroscience. In the study, Purves, Schwartz, and Howe took short, ten- to twenty-second sentences spoken by more than 600 speakers of English and other languages and broke them into 50- to 100-millisecond sound bites. The resulting 100,000 or so sound segments were then manipulated to cancel out the peculiarities that distinguish one person's speech from another-the pitch differences in men's and women's voices, for example.
"What you end up with is what is common to all the speech sounds," Schwartz says: the vowels and the frequencies they produce when vocalized. Graphed, the homogenized speech sounds resembled jagged peaks and valleys. Remarkably, the peaks, which represent strong concentrations of acoustic energy, corresponded to most of the twelve notes in the chromatic scale.
"The peaks happen to occur at ratios that are exactly those that define the chromatic scale," Schwartz says. "It's one of those examples of a picture being worth a thousand words."
After establishing a statistical link between speech and music, the researchers tried to determine which aspects of speech were generating the same intervals found in music.
Human speech begins with the vocal cords. Air forced up by the lungs passes over the cords, causing them to vibrate at certain frequencies, depending on the force of the air and the position of the vocal cords. These "base" frequencies are then modified by the soft palate, tongue, lips, and other parts of the vocal tract, filtering out some frequencies and creating additional "resonant" ones.
Purves likens our vocal cords to the strings of a guitar and the rest of the vocal tract to the guitar's body. "You pluck a guitar string absent the guitar, it sounds like hell," he says. "You need the resonance of the body of the guitar to transform the sound into something that sounds good, and that is basically what the vocal tract does."
The most energetic resonant frequencies of speech are called formants, and they are critical for vowel enunciation. Nearly every vowel can be characterized by two main formants that can be expressed as a numerical ratio. (The frequency of the first formant is between 200 and 1,000 hertz and the frequency of the second formant is between 800 and 3,000 hertz, depending on the vowel.)
In a 2007 study comparing speech and music, reported in the journal Proceedings of the National Academy of Sciences this past October, Purves, Deborah Ross, a postdoctoral research fellow in Duke's Center for Cognitive Neuroscience, and Jonathan Choi M.D. '08, a research associate, asked native speakers of English and Mandarin to pronounce vowel sounds both as part of individual words and as part of a series of short monologues.
They then used a spectrum analyzer to break apart vowels to reveal their component formants. A comparison of the ratios of the first and second vowel formants and the numerical ratios of musical intervals revealed that the two sets of ratios were very similar. "In about 70 percent of the speech sounds, these ratios were bang-on musical intervals," Purves says.
For example, when people say "o," as in the syllable "bod," the frequency ratio between the first two formants might correspond to a major sixth-the interval between the musical notes C and A. When they say the "oo" sound in "booed," the ratio matches a major third-the distance between C and E.
The results were similar in Mandarin speakers. In both languages, an octave gap was the most common, while a minor sixth, which is the interval between the musical notes C and A-flat, was fairly uncommon-a pattern reflected in the musical preferences of many cultures around the world (see Figure 2).
For both English and Mandarin speakers, the major formants in vowel sounds paralleled the intervals for the most commonly used intervals in music worldwide, namely the octave, the fifth, the fourth, the major third, and the major sixth.
To Purves, the upshot is a simple truth: "There's a biological basis for music, and that biological basis is the similarity between music and speech," he says. "That's the reason that we like music."
continues on page