When considering evolution, we often think first about the visible physiological changes that species undergo. But changes in phenotype — those physical manifestations — are only the most apparent end results of intricate processes that occur at many molecular levels, including in the DNA, gene expression, protein expression, and histone modification levels. Through genomic tools, scientists can measure all of these levels in multiple species in order to understand evolutionary patterns.
An obstacle to that understanding is the remarkable complexity of the datasets generated from those measurements. The molecular components that are measured are not the same across species, resulting in sizable quantities of difficult-to-analyze data. Fortunately, associate professor of biostatistics and medical informatics Sushmita Roy is an expert in just such complexity: she uses computational methods to perform comparative analysis, model functional behavior, map regulatory networks, and more. In research published in November in Nucleic Acids Research, her group developed tools to reveal evolutionary patterns of six plant species.
Roy’s collaborators, including UW–Madison’s Josh Coon, Jean-Michel Ané, and Michael Sussman, used new high-throughput proteomic methods to measure the proteomes — the full set of proteins that can be expressed — of the plants, creating a large and complex dataset. With a new computational approach led by Roy lab member Junha Shin, publicly-available gene expression datasets, and the new data, Roy and her collaborators were able to connect changes in protein levels to phenotypic traits. They were also able to determine that protein expression is more conserved than gene expression, meaning that proteins change more slowly than gene expression as evolution marches on.
Finally, Roy’s computational tools made it possible to make comparisons across species, even when more was known about some species than others. “Now we can use information from a model organism in this collection of species and transfer it into another organism,” says Roy, “so we can annotate the function of new genes and predict their role in different processes.” This is important because often in comparative studies, not all species have been equally studied; you may have one or two well-studied species to compare with four or five less-known ones. The approach used in the Nucleic Acids Research study to apply information from one species to another is one way that the tools’ applications reach beyond the current work.
A second study, published in Genome Biology in January, adapted Roy’s tools to study the evolution of gene regulatory networks in a family of East African fish: cichlids. Cichlids are a good model to study because their genomes change rapidly within a short evolutionary timescale. Roy and her collaborators at the Earlham Institute in the United Kingdom were able to make systematic comparisons of how tissue-specific gene expression changed across six species of cichlids. The study, which included contributions from former graduate student in the Roy group Christopher Koch and current group member Sara Knaack, provided new insights into the connections between gene regulatory networks, adaptive traits, and relationships to the environment.
This novel application demonstrates that tools for comparative genomics open many new avenues of inquiry, especially as large, multi-omic datasets become more common. “One thing that I would like people to think about is measuring systematic data sets, across multiple species,” says Roy. “Go beyond [DNA] sequence to measure multiple molecular levels, including gene and protein expression and chromatin state to enable these types of comparisons.”