Gosselin, Kevin

Description

The purpose of this investigation is to apply a machine learning algorithm with de-identified, historic oncology clinical trial data to assess the theoretical understanding of predictive modeling to derive potential clinical practice recommendations. Within this study, electronic medical records from…

The purpose of this investigation is to apply a machine learning algorithm with de-identified, historic oncology clinical trial data to assess the theoretical understanding of predictive modeling to derive potential clinical practice recommendations. Within this study, electronic medical records from the HonorHealth Virginia G. Piper Institute will undergo data visualization to identify potential correlations and trends critical for model creation as well as further identify potential expansions or limitations of scope regarding model purpose. Hypothesis pursued post data visualization was the development of a predictive model for 6-month survival. Current standard is estimated physician accuracy at 56.5% accuracy at 6 months out. This study created supervised learning models using decision trees, KNN, SVM and Ensemble methods using combinations of LASSO Logistic Regression and Know-GRFF Random Forest for feature selection. SVM trained on a combined set of LASSO and Know-GRRF featured produced the highest performing model at 75.5% with an AUC of 0.82. This study demonstrates the potential for applying predictive modeling on readily available EMR records to drive clinical practice recommendations. The models developed could potentially, with further development, be used as an ancillary tool for jumpstarting patient-physician conversations on survival and life expectancy.

Date Created

2019-05

Agent

Co-author: Li, Richard Longfei
Co-author: Liu, Li
Co-author: Gosselin, Kevin
Thesis director: Liu, Li
Committee member: Gosselin, Kevin
Contributor (ctb): Harrington Bioengineering Program
Contributor (ctb): Harrington Bioengineering Program
Contributor (ctb): Barrett, The Honors College