Mash-ups of Evolutionary Data

April 1, 2009
Speciation trail: detail from NESCent poster charting 74 million years of the evolution and geographic spread of hoofed animals.

Speciation trail: detail from NESCent poster charting 74 million years of the evolution and geographic spread of hoofed animals. (View a larger image) David M. Kidd and Samantha A. Price, NESCent

A hundred and fifty years after Charles Darwin assembled a mountain of disparate data into one grand synthesis, and fifty years after scientists began cranking out the gene-by-gene description of every life form they could get their hands on, you'd think it would be about time for a little more synthesis.

In a row of small offices that could pass for an insurance agency, Duke biology professor Kathleen Smith heads an experimental program aimed at jump-starting just that. "There's value in a half- a-century's data," she says with just a hint of understatement. But how do you begin to mine it?

The National Evolutionary Synthesis Center, or NESCent, has support from the National Science Foundation (NSF) and is housed in the Erwin Square Mill, a converted tobacco warehouse that Duke rents, between Central and East campuses.

Although many of the NESCent scientists have experience catching dangerous things in swamps and doing mind-numbing tasks to satisfy the needs of laboratory machines, here they sit in front of computers and stand, feet dry, in front of white boards.

They're creating mash-ups of related data from different disciplines of science and different orders of life, trying to get their heads around the patterns that might reveal some larger truths.

NESCent postdoctoral fellows Samantha Price and David Kidd, for example, combined several sets of data on the evolution and geographic spread of hoofed animals. The result is a detailed yet accessible poster depicting the 74 million-year-old history of the entire artiodactyl family imposed on a series of maps that shows, with new clarity, when and where camels, cows, and antelopes went their separate ways.

"Visualization is so important for synthesis," says Price, who recently moved from Durham to the University of California at Davis. "When you've got a huge set of synthetic data, you can't really understand it without visualization."

The four-year-old NESCent, which is applying for a second round of NSF funding, is developing new visualization tools. It also hosts working groups of scientists from diverse fields around the country who are eager to start putting the pieces together around some common questions.

"At this point, we're not talking about the grand synthesis yet," says Smith, who shares leadership of NESCent with colleagues at the University of North Carolina at Chapel Hill and North Carolina State University. "But we do have the potential for really understanding phylogeny," the family tree of evolutionary history.

"We're living in an era when we have so much information that we have to go back to a synthetic mode of thinking again because we're starting to lose the forest for the trees," says Greg Wray Ph.D. '87, a professor of biology at Duke who sits on NESCent's advisory board. "There are now so many trees that we can't actually see how the pieces fit together anymore."