In Search of Music's Biological Roots

Seeking to understand the universal appeal of music, neuroscientist Dale Purves has discovered surprising similarities between the twelve-note chromatic scale and the universal tones found in speech.
Writer: 
June 1, 2008

Dale Purves is not musical by nature. He's been trying to play the guitar for forty years-with limited success. He has no formal training and, if presented with a sheet of music, can't tell an F-sharp from a B-flat.

But even though he is not musical, Purves is deeply curious about music. Why, he wonders, do humans appear to be hard-wired to appreciate it despite its lack of a clear survival benefit? Why do we find some combinations of musical notes pleasing but can't stand others? And, perhaps most enticing, why do we think of some types of music as happy and bright, but others as dark and sad-in other words, how did music come to pack such an emotional wallop?

Complex and sublime: Amateur guitarist Purves contemplates the complicated role music plays in our lives.

Complex and sublime: Amateur guitarist Purves contemplates the complicated role music plays in our lives. Michael Zirkle

Purves (pronounced purr-VEHZ), a physician and neurobiologist who heads Duke's Center for Cognitive Neuroscience, has made a name for himself studying human vision. More recently, he has also turned his attention to hearing. "We started looking at audition to compare it to the theories we developed in vision," he says.

Over the last five years, he and his fellow researchers have provided compelling evidence that our species' fondness for music is linked to another human universal: language. The findings suggest that humans like music because, in subtle and unconscious ways, it reminds us of speech, arguably the most important social cue in our environment and a critical factor in our species' survival and success.

"The only vocalizations that count for us are the vocalizations we make for each other," Purves says. "Those are the tonal sounds in nature that we've always been exposed to." His studies of music are part of a broader theory he is developing about how our perceptions are shaped not only by our individual experiences, amassed over decades, but also by the collective experiences of our species, gathered over millions of years of evolutionary time.

With the possible exception of love, nothing in the human experience is as difficult to define-or has attracted so many attempts at definition-as music. Music has been described as mysterious, sublime, even divine. The French novelist Victor Hugo said music "expresses that which cannot be put into words and that which cannot remain silent." Music has been called an "echo of the invisible," the "speech of angels," a "shorthand of emotion," and "unconscious arithmetic." The American poet and musician Sidney Lanier considered the two mysteries equal, and called music "love in search of a word."

What everyone does agree upon is music's universal ability to transcend time, geography, and culture. Throughout history, people living in every corner of the globe have made and listened to music. And whether they coax music by plucking, striking, or blowing instruments crafted from wood, metal, or bone, people in just about every culture make music in the same general way, using subgroups of the same twelve notes.

These notes are known as the chromatic scale and can be heard on a piano by starting with any key and then playing the next twelve black and white keys in succession. On the thirteenth note, the scale begins again, but at a higher frequency. The interval between one piano key and a key of the same name either above or below it is called an octave.

No culture uses all twelve notes of the chromatic scale in its music, but nearly all musical traditions make music based on some combination of notes within it. Traditional Chinese music and much of American folk music, for example, are made using what's called the pentatonic scale, which uses five of the notes within the octave (F and B are not used). The five notes of the pentatonic scale are a subset of the seven-note diatonic scale used in Classical Western music. The latter includes the familiar "Do-Re-Mi-Fa-So-La-Ti-Do" taught in schools.

The widespread use of the chromatic scale is puzzling if you consider that the human auditory system is capable of distinguishing a very large number of notes, also called pitches, over the range of sound frequencies that humans can hear (about 20 to 20,000 "hertz" or cycles per second).

"Why is it, despite the fact that we can hear many, many different pitch relationships, we use just these twelve relationships in music pretty much universally?" Purves asks. "There are embellishments on this basic fact-Arabian and Indian music and American blues use some well-defined variations-but, basically, we humans all build music using the same bricks."

Not only do different cultures compose music using the same notes, they also agree on which note combinations sound pleasing and which rankle the ears-a phenomenon music theorists and auditory scientists call relative consonance. For example, given a choice, nearly everyone in the world will agree that C together with F, which is the musical interval called a fourth, is a more pleasing note combination than C together with F sharp, which is called a tritone.

Philosophers and scientists have struggled for centuries to explain why we find certain combinations so appealing. One of the earliest attempts looked at music's mathematical properties. Some 2,500 years ago, the Greek philosopher Pythagoras, who was obsessed with numbers and their significance, demonstrated a direct relationship between how pleasing or harmonious some tone combinations sounded and the physical dimensions of the object that produced them. For example, a plucked string will always sound a fourth lower in pitch than an identical string three-quarters its length, a fifth lower than a string two-thirds its length, and a full octave lower than a string half its length.

Pythagoras believed that the intervals of the fourth, the fifth, and the octave sounded beautiful because the ratio of the frequency of the two notes making up the sounds were small-number fractions such as 4/3, 3/2, or 2/1. "It's basically a mystical explanation," says David Schwartz, a Duke neuroscientist who has worked with Purves on his studies of music. "He thought that the gods in some sense preferred simple small numbers, and that the pleasure we take in the sounds of these intervals is a perceptual manifestation of the intrinsic beauty of small-number ratios."

"Pythagoras and many others had mystified music by making it seem that it had to do with celestial motions," Purves says. "That's just hocus-pocus.

"Others, up to the nineteenth century and beyond, have argued that it's all about physics, that you can explain consonance in terms of physical relationships having to do with these harmonic ratios."

The belief that math and music are closely interrelated is still widespread today. Indeed, an entire industry has been built around the so-called Mozart effect, the controversial claim that listening to Mozart or other complex music provides temporary boosts in mathematical abilities because the brain regions involved in processing music are also involved in other mental tasks, including math.

Taking note: Purves, left; Joshua Tan, a junior, middle; and Kamraan Gill, a medical student and research associate.

Taking note: Purves, left; Joshua Tan, a junior, middle; and Kamraan Gill, a medical student and research associate. Michael Zirkle

Dissatisfied with the explanations for music's appeal advanced by philosophers, scientists, and poets alike, Purves, along with Schwartz and Catherine Howe Ph.D. '03 (then a postdoctoral fellow at Duke, now a resident in psychiatry), decided to investigate for themselves, using an approach based on biology and evolution. They began by asking how the natural environment might have shaped our musical preferences.

A key aspect of music is that it is tonal, meaning it is made up of regularly repeating or "periodic" sounds. Most periodic sounds in nature are made by living things. "In order to produce periodic sounds, you need a system that has an oscillator coupled to an energy source that is able to sustain vibrations," Schwartz says. "All living things, whether you're talking about insects or frogs or you and me, can produce and control energy."

The most salient periodic sounds humans hear on a consistent basis are the vocalizations we make to communicate with one another by means of speech, Purves says. And speech, like music, is tonal. Furthermore, the tonal quality of speech is specifically associated with the production of vowels that in English of course include the sounds represented by the letters a, e, i, o, and u. That's because our vocal cords only vibrate during vowel production. (In contrast, spoken consonants, which in English include sounds represented by the letters b, f, t, p, and so on, don't require the vocal cords and are thus not tonal.)

"If you think of music as being generated by a string that's plucked on a guitar, that's very similar physically to the vibrations of the vocal cords during vowel production," Purves explains.

From this knowledge, the team hypothesized that speech played a major role in shaping the evolution and development of the human auditory system, including the tonal preferences found in music. The team presented evidence supporting this idea in a 2003 study published in the Journal of Neuroscience. In the study, Purves, Schwartz, and Howe took short, ten- to twenty-second sentences spoken by more than 600 speakers of English and other languages and broke them into 50- to 100-millisecond sound bites. The resulting 100,000 or so sound segments were then manipulated to cancel out the peculiarities that distinguish one person's speech from another-the pitch differences in men's and women's voices, for example.

"What you end up with is what is common to all the speech sounds," Schwartz says: the vowels and the frequencies they produce when vocalized. Graphed, the homogenized speech sounds resembled jagged peaks and valleys. Remarkably, the peaks, which represent strong concentrations of acoustic energy, corresponded to most of the twelve notes in the chromatic scale.

"The peaks happen to occur at ratios that are exactly those that define the chromatic scale," Schwartz says. "It's one of those examples of a picture being worth a thousand words."

After establishing a statistical link between speech and music, the researchers tried to determine which aspects of speech were generating the same intervals found in music.

Human vocal tract. The tonal quality

Figure 1. Human vocal tract. The tonal quality of speech is specifically associated with the production of vowels, because our vocal cords only vibrate during vowel production. Purves Lab.

Human speech begins with the vocal cords. Air forced up by the lungs passes over the cords, causing them to vibrate at certain frequencies, depending on the force of the air and the position of the vocal cords. These "base" frequencies are then modified by the soft palate, tongue, lips, and other parts of the vocal tract, filtering out some frequencies and creating additional "resonant" ones.

Purves likens our vocal cords to the strings of a guitar and the rest of the vocal tract to the guitar's body. "You pluck a guitar string absent the guitar, it sounds like hell," he says. "You need the resonance of the body of the guitar to transform the sound into something that sounds good, and that is basically what the vocal tract does."

The most energetic resonant frequencies of speech are called formants, and they are critical for vowel enunciation. Nearly every vowel can be characterized by two main formants that can be expressed as a numerical ratio. (The frequency of the first formant is between 200 and 1,000 hertz and the frequency of the second formant is between 800 and 3,000 hertz, depending on the vowel.)

In a 2007 study comparing speech and music, reported in the journal Proceedings of the National Academy of Sciences this past October, Purves, Deborah Ross, a postdoctoral research fellow in Duke's Center for Cognitive Neuroscience, and Jonathan Choi M.D. '08, a research associate, asked native speakers of English and Mandarin to pronounce vowel sounds both as part of individual words and as part of a series of short monologues.

They then used a spectrum analyzer to break apart vowels to reveal their component formants. A comparison of the ratios of the first and second vowel formants and the numerical ratios of musical intervals revealed that the two sets of ratios were very similar. "In about 70 percent of the speech sounds, these ratios were bang-on musical intervals," Purves says.

For example, when people say "o," as in the syllable "bod," the frequency ratio between the first two formants might correspond to a major sixth-the interval between the musical notes C and A. When they say the "oo" sound in "booed," the ratio matches a major third-the distance between C and E.

The results were similar in Mandarin speakers. In both languages, an octave gap was the most common, while a minor sixth, which is the interval between the musical notes C and A-flat, was fairly uncommon-a pattern reflected in the musical preferences of many cultures around the world (see Figure 2).

For both English and Mandarin speakers, the major formants in vowel sounds paralleled the intervals for the most commonly used intervals in music worldwide, namely the octave, the fifth, the fourth, the major third, and the major sixth.

To Purves, the upshot is a simple truth: "There's a biological basis for music, and that biological basis is the similarity between music and speech," he says. "That's the reason that we like music."

Music in speech. Frequency ratios between the first two formants or areas of high energy

Figure 2. Music in speech. Frequency ratios between the first two formants or areas of high energy produced when people make certain vowel sounds. For example, when you say the “o” (a) sound in “bod,” the frequency ratio between the first two formants (F1 and F2) matches a major sixth —the distance between C and A, as indicated on the piano keys. The vertical axis of the graphs shows loudness represented in decibels. The horizontal axis shows frequencies in hertz. Purves Lab

Purves thinks that human speech can explain more than just relative consonance. It might also hold the key to explaining music's most mysterious property, the one that makes it enchanting even to people with no musical training.

Much of music's power lies in its ability to communicate without words, to speak directly to our emotions. Melodies can etch themselves into our brains and remain for a lifetime. Songs can break your heart, or help to mend it. "We're all familiar with the fact that music has an emotional impact," says Purves. "That's one of the reasons we like it. It generates different emotional responses, and producing those responses is clearly the goal of a lot of musical compositions."

In particular, music composed in major scales sounds bright, spirited, and happy, while minor scale music tends to sound sad, lugubrious, and dark. Musicians have known about and used these relationships for centuries to great effect, but there is no consensus about why major and minor tone combinations evoke the emotions that they do.

In a study currently under way, the researchers are testing the hypothesis that when people talk in a happy way, the formant relationships in their vowels correspond with major keys. And when they talk in a bored, neutral, or sad way, their formant relationships are minor. "Whenever we've heard happy speech, we've tended to hear major-scale tonal ratios," Purves says. "Whenever we've heard sad speech, minor tones tend to be involved.

"We have thus been making those associations since the day we were born. Perhaps when we hear music in a major scale, we unconsciously associate it with happy speech and tend to have that emotional response, and vice versa for music in a minor scale."

Dale Purves' musical research is generally consistent with other work on human perception he has conducted in an attempt to understand vision. Purves has long argued that when we see, our brain is not so much analyzing the present as it is constructing a perception based on past experiences.

Dissatisfied with conventional explanations for how vision works, Purves hypothesized that the properties of vision must somehow be shaped by the world-that through evolution, vision, and perception in general, humans must have adapted to the environment we live in. "We need to understand the environment in which we have to make our living, or we won't be making that living for very long," Purves says.

The evidence that vision works in this counterintuitive way is most apparent in visual illusions, in which discord exists between how people perceive the world and how the world really is. A good example is the standard "brightness contrast" illusion found in psychology textbooks. When pictures of two identically shaded tiles are placed against different shades of gray, people see the tile on the dark background as lighter than the tile on the lighter background. Many scientists explain these illusions as perceptual errors made by an otherwise well-functioning visual system. Purves hypothesized "that they were not in fact mistakes, but correct perceptions if you understood what the visual system is actually trying to do," Schwartz says.

Purves' alternative explanation is based on a long-recognized problem with the visual and other sensory systems, including auditory. Any aspect of a given sensory stimulus, such as the amount of light coming to the eye from its surface in the tile example, can arise from an infinite number of real-world scenarios. For example, our eyes receive exactly the same physical stimulus from a highly reflective surface in weak lighting and a dull surface in stronger lighting.

So how does the brain distinguish between the two real-world scenarios and respond appropriately? Purves and his collaborators argue that the visual part of the brain generates perceptions on the basis of what a given stimulus-such as an image on the retina-has signified in the past.

According to this view, humans and other visual animals do not see the world as it really is. They see it through the filters of their sophisticated sense organs and brains, and they also see it through the distorted lens of experience, both their own and that of their species. Understood in this way, visual illusions are not perceptual errors on the part of our visual system, but correct perceptual decisions made in unnatural settings. "What are termed perceptual errors or illusions are in fact evidence of just how sophisticated the visual system is," Schwartz says.

Purves' musical research extends his theory to the auditory system because, here too, experience plays a critical role: Our exposure to speech has shaped our preferences for the kinds of sounds that we like to hear.

A major implication of his research is that music is not an abstract phenomenon explained by mathematical formulas, neither is the human love of music a cosmic coincidence begging for a mystical explanation. It is a wondrous byproduct of evolution.

"Pythagoras wanted to explain music in mathematical ratios. That just doesn't work," Purves says.

"Music is far more complex than Pythagoras. The reason doesn't have to do with mathematics. It has to do with biology."