The tsunamis have subsided and the continent-sized forest fires have burned out. Gone are the giant dinosaurs and three-quarters of all the other plant and animal species on Earth. They’ve been done in by the cold and dark of 750,000 years of heightened volcanism and the coup de grâce delivered by a six-mile-wide asteroid slamming into what is now Mexico.
But there on a beach is a cluster of scrappy, versatile little dinosaurs scratching out a living on what’s left over. They can eat most anything they find in the tidal zone. And perhaps most important, they’re able to fly to seek better habitats and encounter other groups of birds. Probably resembling a modern plover, these beachcombers are the ancestors of the 10,000 species of birds we know today, from pinkie-sized hummingbirds to seven-foot-tall ostriches.
This story emerges from a gene-sequencing effort of unprecedented ambition and scope that published twenty-eight papers simultaneously in coordinated special issues of Science, Genome Biology, and GigaScience this past December. The global, four-year collaboration sets the origin of modern birds at half the age some paleontologists have argued and reorganizes major portions of the bird family tree. Taken together, this redrawn avian phylogeny—or family tree—offers new insights into the evolution of flight and singing, the way chickens lost their teeth, the often dramatic differences between males and females, and the historical thread that links birds and crocodiles to their dinosaur forbears.
At the center of the scientific effort, communicating, guiding research and writing, and calmly mediating disputes, was Duke neurobiologist Erich Jarvis, who was one of three scientists leading the massive collaboration.
More than simply upsetting the birdworld’s apple cart, this simultaneous analysis of forty-eight whole genomes shows how to get at some of the deeper secrets of all evolving genomes. And it marks a new age of truly global, hyper-connected, big-data science. The international project began in China and spread across more than twenty countries. Collaborating scientists were never all in the same room, and many never even met in person. They created and shared enormous amounts of data that required supercomputers and new statistical techniques just to parse.
“The thing that blew me away was the organizational magnitude of this study,” says Alan Feduccia, professor emeritus and former chair of biology at the University of North Carolina at Chapel Hill, who first proposed this sort of rapid bird evolution in the 1970s and has tracked Jarvis’ career for the last fifteen years. “Twenty-eight studies published within a week, 200 scientists involved, supercomputers, which represent something like over 400 years of computing time, forty-eight new avian genomes. I’m like, holy cow, this is a brave new world here we’re dealing with!”
Jarvis isn’t a systematist—a person who classifies species into family trees. He isn’t an evolutionary biologist or paleontologist either. He isn’t even all that interested in birds. To him, this enormous effort was merely a means to an end. Jarvis is a brain scientist who has devoted his career to using the bird brain as a model for how speech works in the human brain. He wanted to see how three or four families of vocal-learning birds are related to one another and what genes they used to make speech. He also needed some closely related non-vocal-learning bird species for comparison to see what parts of the genome were different. “But the concept of ‘closely related,’ depends on which version of the bird phylogeny you trust,” he adds.
Over the past decade, Jarvis’ quest for speech-related genes had led him to participate in genome sequencing of the zebra finch, a songbird that’s a laboratory mainstay, and the Budgerigar, a small Australian parrot sold in American pet shops as the parakeet. The zebra finch genome, which was published in 2010, cost about $8 million. The budgie genome was done as a test of three competing technologies, resulting in a Duke bird named Mister B becoming the most-sequenced vertebrate to date, with more than 300 complete readings of its genome.
Through efforts like this and the demand for still more genomes, the technology for doing “whole-genome” sequencing— absolutely every letter of DNA in the animal’s chromosomes—has grown faster, cheaper, and more widely available. Nowhere is that more true than at BGI in Shenzhen, China. Formerly known as the Beijing Genomics Institute, BGI has become the world’s go-to source for ambitious non-human sequencing efforts like this. Just a few weeks before the bird phylogeny was published, Science had an insect phylogeny on its cover that also had been completed by BGI.
In early 2010, Jarvis was looking for more vocal-learning genomes. In the hopes of getting the right birds sequenced, he “selfishly” accepted an invitation to become involved with an international consortium called G10K, which hopes to sequence 10,000 vertebrate genomes.
At about the same time, Copenhagen colleagues Guojie Zhang and M. Thomas P. Gilbert were talking about the bird-sequencing project they could get funded at BGI. Zhang is an evolutionary biologist with appointments at both the University of Copenhagen and BGI. British-born and Oxford-educated Gilbert is an ancient-DNA specialist at the Natural History Museum of Denmark.
The Copenhagen scientists set their sights on the scruffy, unloved urban pigeon as their first project, but it proved difficult to place on the overall bird phylogeny. Then, like Jarvis, they hit on the idea of joining forces with the G10K to add more birds for sequencing. For one thing, a lot of the sequencing the project needed was starting to be done at BGI, where Zhang is associate director of the GeneBank that would be doing the DNA sequencing and much of the computer science.
“Guojie writes to Steve O’Brien, who’s the big overall boss of G10K,” Tom Gilbert recalls. “Steve says, ‘Great idea, let’s involve the actual bird people in G10K.’ He cc’s Erich and Klaus-Peter Koepfli and Warren Johnson [both of the Smithsonian Conservation Biology Institute in the U.S.]. Very quickly, Erich became the most enthusiastic person. He became very talkative on e-mail at this point, and he basically listed the ten or so species that the bird group of the G10K were planning to send to BGI, which were not all picked for phylogenetic reasons; about half were picked because they were people’s favorite birds for whatever they were doing.”
Jarvis was guilty of that, of course; he mainly wanted his vocal birds. Others had their own pet reasons. But this approach wouldn’t be the best way to answer family-tree questions, Gilbert explains. If they were going to go to all this trouble, they might as well also resolve the long-simmering debate over whether modern bird species are 66 million or 100 million years old.
Jarvis, Zhang, and Gilbert compared their lists and realized half of the species overlapped and that, if they replaced the overlap with other species, they would have twenty species across the phylogeny. Some of these were already being sequenced in other collaborations with BGI. It seemed that re-mapping the bird family tree with these whole-genome sequences might be within reach if they could get the right birds. “Basically at this point, Guojie started saying, ‘Okay, we can do one more, we can do one more,’ ” Gilbert says. “ ‘It’d be really great to include this one.’ ” Relatively soon, they were up to forty-five new species to add to the previously published chicken, turkey, and zebra finch.
Getting fresh, useful DNA from obscure birds, living and dead, to represent every major order of the bird family tree turned out to be easier said than done. For Macqueen’s bustard and the yellow-throated sand grouse, Gilbert recalls, “I put one of my students on a plane and flew them over to Sharjah [a bird sanctuary an hour from Dubai] for a day to go to this breeding center and let them bleed the birds and bring back the DNA.” After weeks of persuasive telephone calls with a stubborn zookeeper who held the rare and elusive cuckoo roller, “I put one of my grad students in the car and said, ‘Drive 1,000 kilometers to southern Germany, go to this vet, and get the damn sample.’ ” At Duke, Jarvis and his research analyst, Jason Howard, obtained samples from collections at Louisiana State University, the Field Museum of Chicago, the Carolina Raptor Center, and the North Carolina Zoo in Asheboro.
Most of those samples then came, carefully packed in dry ice, to Duke or to Copenhagen for DNA extraction. Working quickly with a hard-won vial of fluid or a small chunk of flesh and marking labels carefully at each step of the process, the labs ran an exacting series of steps to liberate high-quality pure DNA. This effort now continues on more bird species with a team of undergraduates led by Jarvis lab research analyst Carole Parent.
“But the DNA extraction is only the first step of the long march,” Zhang says. The DNA vials were then shipped to BGI for state-of-the-art sequencing in a bank of expensive machines that can tease out every one of the 1.2 billion As, Ts, Cs, and Gs of a bird’s DNA. After quality control and library-building, which takes several weeks for each sample, whole-genome sequencing captures the bird’s 14,000 genes, plus all the other DNA that controls when and how genes are operated.
By January 2011, most of the specimens had been lined up. By March, the DNA was off to China. By July, it was pure genomic data—vast, confusing, contradictory, but potentially richer than any story ever told about the birds. “We start doing initial analyses, and we’re finding there are all sorts of problems,” Gilbert says. “It turns out you can’t just compare genomes and genomes, you’ve got to do standardization.” They had terabytes of mismatched data, and the differences they were looking for could have been as small as one letter. The three leaders laugh now at their expectation that they’d have a phylogeny published by the end of 2011.
Computer scientists from several labs worked for two more years on the standardization problem for genome alignments, annotations, and generating trees. At one point Zhang’s group devoted twenty people for a solid month just to build a database that could recalibrate all the data. “You don’t see this in the final data because it’s just quality-control things, these mundane, basic tasks,” Gilbert says.
To hear Jarvis tell it, his co-leadership of the consortium just sort of happened. As the project began to mushroom, “we invited additional experts who could add valuable insights to join the project,” Jarvis says. Each week, he sent out an agenda and chaired the weekly conference with Zhang and Gilbert, a one- to two-hour virtual meeting with data and graphs that grew to twenty or thirty scientists at times. After each conference, there would be a flurry of follow-up calls and e-mails.
“Erich and I had many phone calls before and after the weekly conference calls about what needed to be done,” says Tandy Warnow, the Founder Professor of bioengineering and computer science at the University of Illinois, who was brought in to guide the computational and statistical challenge of drawing a phylogeny of unprecedented proportions. “This was neither a hardware problem nor a software problem. It was a statistical estimation problem that required a fundamentally different approach.”
Of course, Jarvis isn’t an expert in these areas either. “Erich could have chosen to ignore these analytical challenges,” Warnow says. “But he wanted to understand all the mathematical issues involved, why the trees we were producing turned out differently, and what the differences implied about what was going on in the avian evolutionary history. So he did a very wise thing: He called on the people who understood the mathematical issues, me and my student Siavash Mirabab and Ed Braun, a [University of Florida] biologist who has a very good understanding of statistical issues, to explain the issues and help resolve debates within the group.”
Some of the career bird systematists began to resent seeing non-systematists in the leadership of a new phylogeny. Mutinies were quietly proposed along the margins. Jarvis put out fires, settled disputes, made people feel they were being heard, and kept the scientific army advancing. “Part of the goal is to let people talk,” says Jarvis, who is supported by the Howard Hughes Medical Institute and several grants from the National Institutes of Health. “To get their word in. To make sure that their important opinion is heard and dealt with, because if it wasn’t dealt with, we couldn’t get consensus, and we couldn’t get agreement to have a paper that all these people could sign off on.”
The volume of e-mail—nuanced, detailed, well-reasoned, prickly or argumentative— is more than Jarvis cares to recall. For two years, additional scientific groups around the world were enlisted to analyze parts of the data, spinning out a constellation of papers about penguins, ibises, viral remnants, colored plumage, and so forth, and swelling the numbers of collaborators and e-mails still further. With colleagues on every continent except Antarctica, the sun never set on this collaboration.
“Waking up in the morning, I would have a series of e-mails I’d have to deal with,” Jarvis says. “I’d go from my bed to my desk, which is in my room. It was a bunch of e-mails about folks in the lab and outside the lab, phone calls, contacts with editors of course, and then evaluating and steering the actual experiments and data analyses. And then once I’m done with the morning work, I’d come to the lab, talk to the people in the lab, manage projects here, and so forth. By the afternoon, I’m dealing with more e-mails from the phylogenomics group. Then I go eat. I go to my Cobo brothers dance class to de-stress or go out salsa dancing, and after that I go back home and take care of what I need to do in the evening before going to sleep. I do that over and over again.” He worked until at least midnight for two solid years. “I made sure that once I got to 3 a.m. I’d force myself to go to bed. Because I know if I stay up past 3 a.m., I am not good for the rest of the day.”
As draft papers emerged and timelines for a December 2014 wave of publications were set, Jarvis’ role grew to include negotiating with several journal editors on one side, and on the other "keeping things together, pre-reviewing all the papers, and keeping collaborations intact, as well as helping to actually steer the direction of the research in other people’s labs, not just my own.” He pre-reviewed all of the papers, even the ones he wasn’t an author on.
“I’ve never seen Erich lose his cool,” Gilbert says. “It’s quite amazing. Maybe it’s because he was always exhausted. I do wonder if he slept.”
The bird consortium has pioneered new analytical tools that other research groups can exploit, and it has uncovered some important new details of how rapid evolution happens at the genomic scale. The effort will be moving on with the same three leaders, though it may shrink a bit for practicality, and it will keep sequencing more bird genomes. “If the next tree turns out to be the same kind of tree, we can change the classification of birds, and that will change the birders’ checklists and who knows what,” a better-rested Jarvis says in his sunny Bryan Research Building office, surrounded by academic honors and bird books. “That is a big deal.”
The rest of us will have to get used to these papers with 100 authors, UNC’s Feduccia says, because this is how big-data science is going to play out. “The thing that’s so impressive about Erich’s work is that, not only is he a brilliant scientist, beyond question, but this new style of research takes Herculean organizational skill, and he apparently has it.”
“I’ve stopped thinking of the boundaries of my four rooms as my lab,” Jarvis says. “I have more control over what’s happening here than anywhere else. But from a psychological point of view, my boundaries have expanded to other labs at Duke, to the rest of Duke, to other universities in the United States, and the rest of the world. To do big projects, you have to pull down both physical and mental barriers and work with ‘We the Scientist’ more than ‘I the Scientist.’ ”
Bates is the director of research communications for Duke.
Follow @ErichJarvis on Twitter. Since the Avian Phylogeny papers launched last December, Jarvis has been posting tweets explaining the findings.