Generalized linear models in Bayesian phylogeography

155265-Thumbnail Image.png
Description
Bayesian phylogeography is a framework that has enabled researchers to model the spatiotemporal diffusion of pathogens. In general, the framework assumes that discrete geographic sampling traits follow a continuous-time Markov chain process along the branches of an unknown phylogeny that

Bayesian phylogeography is a framework that has enabled researchers to model the spatiotemporal diffusion of pathogens. In general, the framework assumes that discrete geographic sampling traits follow a continuous-time Markov chain process along the branches of an unknown phylogeny that is informed through nucleotide sequence data. Recently, this framework has been extended to model the transition rate matrix between discrete states as a generalized linear model (GLM) of predictors of interest to the pathogen. In this dissertation, I focus on these GLMs and describe their capabilities, limitations, and introduce a pipeline that may enable more researchers to utilize this framework.

I first demonstrate how a GLM can be employed and how the support for the predictors can be measured using influenza A/H5N1 in Egypt as an example. Secondly, I compare the GLM framework to two alternative frameworks of Bayesian phylogeography: one that uses an advanced computational technique and one that does not. For this assessment, I model the diffusion of influenza A/H3N2 in the United States during the 2014-15 flu season with five methods encapsulated by the three frameworks. I summarize metrics of the phylogenies created by each and demonstrate their reproducibility by performing analyses on several random sequence samples under a variety of population growth scenarios. Next, I demonstrate how discretization of the location trait for a given sequence set can influence phylogenies and support for predictors. That is, I perform several GLM analyses on a set of sequences and change how the sequences are pooled, then show how aggregating predictors at four levels of spatial resolution will alter posterior support. Finally, I provide a solution for researchers that wish to use the GLM framework but may be deterred by the tedious file-manipulation requirements that must be completed to do so. My pipeline, which is publicly available, should alleviate concerns pertaining to the difficulty and time-consuming nature of creating the files necessary to perform GLM analyses. This dissertation expands the knowledge of Bayesian phylogeographic GLMs and will facilitate the use of this framework, which may ultimately reveal the variables that drive the spread of pathogens.
Date Created
2017
Agent

Bayesian Phylogeography of Influenza A/H3N2 for the 2014-15 Season in the United States Using Three Frameworks of Ancestral State Reconstruction

128631-Thumbnail Image.png
Description

Ancestral state reconstructions in Bayesian phylogeography of virus pandemics have been improved by utilizing a Bayesian stochastic search variable selection (BSSVS) framework. Recently, this framework has been extended to model the transition rate matrix between discrete states as a generalized

Ancestral state reconstructions in Bayesian phylogeography of virus pandemics have been improved by utilizing a Bayesian stochastic search variable selection (BSSVS) framework. Recently, this framework has been extended to model the transition rate matrix between discrete states as a generalized linear model (GLM) of genetic, geographic, demographic, and environmental predictors of interest to the virus and incorporating BSSVS to estimate the posterior inclusion probabilities of each predictor. Although the latter appears to enhance the biological validity of ancestral state reconstruction, there has yet to be a comparison of phylogenies created by the two methods.

In this paper, we compare these two methods, while also using a primitive method without BSSVS, and highlight the differences in phylogenies created by each. We test six coalescent priors and six random sequence samples of H3N2 influenza during the 2014–15 flu season in the U.S. We show that the GLMs yield significantly greater root state posterior probabilities than the two alternative methods under five of the six priors, and significantly greater Kullback-Leibler divergence values than the two alternative methods under all priors. Furthermore, the GLMs strongly implicate temperature and precipitation as driving forces of this flu season and nearly unanimously identified a single root state, which exhibits the most tropical climate during a typical flu season in the U.S.

The GLM, however, appears to be highly susceptible to sampling bias compared with the other methods, which casts doubt on whether its reconstructions should be favored over those created by alternate methods. We report that a BSSVS approach with a Poisson prior demonstrates less bias toward sample size under certain conditions than the GLMs or primitive models, and believe that the connection between reconstruction method and sampling bias warrants further investigation.

Date Created
2017-02-07
Agent

Combining Phylogeography and Spatial Epidemiology to Uncover Predictors of H5N1 Influenza A Virus Diffusion

Description

Emerging and re-emerging infectious diseases of zoonotic origin like highly pathogenic avian influenza pose a significant threat to human and animal health due to their elevated transmissibility. Identifying the drivers of such viruses is challenging, and estimation of spatial diffusion

Emerging and re-emerging infectious diseases of zoonotic origin like highly pathogenic avian influenza pose a significant threat to human and animal health due to their elevated transmissibility. Identifying the drivers of such viruses is challenging, and estimation of spatial diffusion is complicated by the fact that the variability of viral spread from locations could be caused by a complex array of unknown factors. Several techniques exist to help identify these drivers, including bioinformatics, phylogeography, and spatial epidemiology, but these methods are generally evaluated separately and do not consider the complementary nature of each other. Here, we studied an approach that integrates these techniques and identifies the most important drivers of viral spread by focusing on H5N1 influenza A virus in Egypt because of its recent emergence as an epicenter for the disease. We used a Bayesian phylogeographic generalized linear model (GLM) to reconstruct spatiotemporal patterns of viral diffusion while simultaneously assessing the impact of factors contributing to transmission. We also calculated the cross-species transmission rates among hosts in order to identify the species driving transmission. The densities of both human and avian species were supported contributors, along with latitude, longitude, elevation, and several meteorological variables. Also supported was the presence of a genetic motif found near the hemagglutinin cleavage site. Various genetic, geographic, demographic, and environmental predictors each play a role in H1N1 diffusion. Further development and expansion of phylogeographic GLMs such as this will enable health agencies to identify variables that can curb virus diffusion and reduce morbidity and mortality.

Date Created
2015-01-01
Agent