Crawley, Sunny S [1], Burleigh, J Gordon [2], Hilu, Khidir W. [3].

Ideal Dataset Shape in Molecular Phylogenetics: Vertical, Horizontal, or Combined?

As the cost of complete genome sequencing has declined, more complete chloroplast genomes are being sequenced, providing an abundance of characters for phylogenetic reconstruction. Data matrices comprised of complete genome sequences generally are skewed toward numerous characters for relatively few taxa, resulting in “horizontal” datasets. These types of datasets are thought to provide greater accuracy, presumably resulting in better approximation of species phylogeny compared to those based on single or a few genes. In contrast, datasets containing fewer genomic regions are more likely to provide a denser taxon sampling, resulting in “vertical” datasets. Here, we compare approaches of using “vertical” and “horizontal” datasets separately using the Caryophyllales as a case study. We also contrasted these with analyses of a “combined dataset,” which inherently introduces a large proportion of missing genomic regions. Additionally, we performed random resampling analyses on the complete chloroplast genome dataset to investigate the effectiveness of using subsamples of characters equal to those in two selected genomic regions. The “vertical” dataset recovered a robust phylogeny approaching that based on the “horizontal” dataset. This was achieved despite 38% missing data, but with the added benefit of much denser taxon sampling. Analyses of the combined dataset recovered with moderate support the major clades of the order that were obtained in the partition analyses. Resampling of the “horizontal” dataset indicated that selection of appropriate genomic regions is much more effective at resolving the phylogeny of the Caryophyllales than the same number of characters chosen at random.

1 - Virginia Tech, Biological Sciences, 2119 Derring Hall, Blacksburg, VA, 24061, USA
2 - University of Florida, Department of Biology, 220 Bartram Hall, 118526, Gainsville, FL, 32611, USA
3 - Virginia Tech, Biological Sciences, Blacksburg, Virginia, 24061, USA

