Full metadata
Title
A composite genome approach to identify phylogenetically informative data from next-generation sequencing
Description
Background
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.
Results
For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.
Conclusions
SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.
Results
For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.
Conclusions
SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.
Date Created
2015-06-11
Contributors
- Schwartz, Rachel (Author)
- Harkins, Kelly (Author)
- Stone, Anne (Author)
- Cartwright, Reed (Author)
- Biodesign Institute (Contributor)
- Center for Evolution and Medicine (Contributor)
- College of Liberal Arts and Sciences (Contributor)
- School of Human Evolution and Social Change (Contributor)
- School of Life Sciences (Contributor)
Resource Type
Extent
10 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Identifier
Digital object identifier: 10.1186/s12859-015-0632-y
Identifier Type
International standard serial number
Identifier Value
1471-2105
Series
BMC BIOINFORMATICS
Handle
https://hdl.handle.net/2286/R.I.41539
Preferred Citation
Schwartz, R. S., Harkins, K. M., Stone, A. C., & Cartwright, R. A. (2015). A composite genome approach to identify phylogenetically informative data from next-generation sequencing. BMC Bioinformatics, 16(1). doi:10.1186/s12859-015-0632-y
Level of coding
minimal
Cataloging Standards
Note
The electronic version of this article is the complete one and can be found online at: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0632-y
System Created
- 2017-03-01 04:48:04
System Modified
- 2021-08-16 02:23:30
- 3 years 5 months ago
Additional Formats