Structure-Regularized Partition-Regression Models for Nonlinear System-Environment Interactions

156487-Thumbnail Image.png
Description
Under different environmental conditions, the relationship between the design and operational variables of a system and the system’s performance is likely to vary and is difficult to be described by a single model. The environmental variables (e.g., temperature, humidity) are

Under different environmental conditions, the relationship between the design and operational variables of a system and the system’s performance is likely to vary and is difficult to be described by a single model. The environmental variables (e.g., temperature, humidity) are not controllable while the variables of the system (e.g. heating, cooling) are mostly controllable. This phenomenon has been widely seen in the areas of building energy management, mobile communication networks, and wind energy. To account for the complicated interaction between a system and the multivariate environment under which it operates, a Sparse Partitioned-Regression (SPR) model is proposed, which automatically searches for a partition of the environmental variables and fits a sparse regression within each subdivision of the partition. SPR is an innovative approach that integrates recursive partitioning and high-dimensional regression model fitting within a single framework. Moreover, theoretical studies of SPR are explicitly conducted to derive the oracle inequalities for the SPR estimators which could provide a bound for the difference between the risk of SPR estimators and Bayes’ risk. These theoretical studies show that the performance of SPR estimator is almost (up to numerical constants) as good as of an ideal estimator that can be theoretically achieved but is not available in practice. Finally, a Tree-Based Structure-Regularized Regression (TBSR) approach is proposed by considering the fact that the model performance can be improved by a joint estimation on different subdivisions in certain scenarios. It leverages the idea that models for different subdivisions may share some similarities and can borrow strength from each other. The proposed approaches are applied to two real datasets in the domain of building energy. (1) SPR is used in an application of adopting building design and operational variables, outdoor environmental variables, and their interactions to predict energy consumption based on the Department of Energy’s EnergyPlus data sets. SPR produces a high level of prediction accuracy and provides insights into the design, operation, and management of energy-efficient buildings. (2) TBSR is used in an application of predicting future temperature condition which could help to decide whether to activate or not the Heating, Ventilation, and Air Conditioning (HVAC) systems in an energy-efficient manner.
Date Created
2018
Agent

Bayesian Network Approach to Assessing System Reliability for Improving System Design and Optimizing System Maintenance

156477-Thumbnail Image.png
Description
A quantitative analysis of a system that has a complex reliability structure always involves considerable challenges. This dissertation mainly addresses uncertainty in- herent in complicated reliability structures that may cause unexpected and undesired results.

The reliability structure uncertainty cannot be handled

A quantitative analysis of a system that has a complex reliability structure always involves considerable challenges. This dissertation mainly addresses uncertainty in- herent in complicated reliability structures that may cause unexpected and undesired results.

The reliability structure uncertainty cannot be handled by the traditional relia- bility analysis tools such as Fault Tree and Reliability Block Diagram due to their deterministic Boolean logic. Therefore, I employ Bayesian network that provides a flexible modeling method for building a multivariate distribution. By representing a system reliability structure as a joint distribution, the uncertainty and correlations existing between system’s elements can effectively be modeled in a probabilistic man- ner. This dissertation focuses on analyzing system reliability for the entire system life cycle, particularly, production stage and early design stages.

In production stage, the research investigates a system that is continuously mon- itored by on-board sensors. With modeling the complex reliability structure by Bayesian network integrated with various stochastic processes, I propose several methodologies that evaluate system reliability on real-time basis and optimize main- tenance schedules.

In early design stages, the research aims to predict system reliability based on the current system design and to improve the design if necessary. The three main challenges in this research are: 1) the lack of field failure data, 2) the complex reliability structure and 3) how to effectively improve the design. To tackle the difficulties, I present several modeling approaches using Bayesian inference and nonparametric Bayesian network where the system is explicitly analyzed through the sensitivity analysis. In addition, this modeling approach is enhanced by incorporating a temporal dimension. However, the nonparametric Bayesian network approach generally accompanies with high computational efforts, especially, when a complex and large system is modeled. To alleviate this computational burden, I also suggest to building a surrogate model with quantile regression.

In summary, this dissertation studies and explores the use of Bayesian network in analyzing complex systems. All proposed methodologies are demonstrated by case studies.
Date Created
2018
Agent

Data-Driven Robust Optimization in Healthcare Applications

156337-Thumbnail Image.png
Description
Healthcare operations have enjoyed reduced costs, improved patient safety, and

innovation in healthcare policy over a huge variety of applications by tackling prob-

lems via the creation and optimization of descriptive mathematical models to guide

decision-making. Despite these accomplishments, models are stylized representations

of

Healthcare operations have enjoyed reduced costs, improved patient safety, and

innovation in healthcare policy over a huge variety of applications by tackling prob-

lems via the creation and optimization of descriptive mathematical models to guide

decision-making. Despite these accomplishments, models are stylized representations

of real-world applications, reliant on accurate estimations from historical data to jus-

tify their underlying assumptions. To protect against unreliable estimations which

can adversely affect the decisions generated from applications dependent on fully-

realized models, techniques that are robust against misspecications are utilized while

still making use of incoming data for learning. Hence, new robust techniques are ap-

plied that (1) allow for the decision-maker to express a spectrum of pessimism against

model uncertainties while (2) still utilizing incoming data for learning. Two main ap-

plications are investigated with respect to these goals, the first being a percentile

optimization technique with respect to a multi-class queueing system for application

in hospital Emergency Departments. The second studies the use of robust forecasting

techniques in improving developing countries’ vaccine supply chains via (1) an inno-

vative outside of cold chain policy and (2) a district-managed approach to inventory

control. Both of these research application areas utilize data-driven approaches that

feature learning and pessimism-controlled robustness.
Date Created
2018
Agent

Design and Mining of Health Information Systems for Process and Patient Care Improvement

156299-Thumbnail Image.png
Description
In healthcare facilities, health information systems (HISs) are used to serve different purposes. The radiology department adopts multiple HISs in managing their operations and patient care. In general, the HISs that touch radiology fall into two categories: tracking HISs and

In healthcare facilities, health information systems (HISs) are used to serve different purposes. The radiology department adopts multiple HISs in managing their operations and patient care. In general, the HISs that touch radiology fall into two categories: tracking HISs and archive HISs. Electronic Health Records (EHR) is a typical tracking HIS, which tracks the care each patient receives at multiple encounters and facilities. Archive HISs are typically specialized databases to store large-size data collected as part of the patient care. A typical example of an archive HIS is the Picture Archive and Communication System (PACS), which provides economical storage and convenient access to diagnostic images from multiple modalities. How to integrate such HISs and best utilize their data remains a challenging problem due to the disparity of HISs as well as high-dimensionality and heterogeneity of the data. My PhD dissertation research includes three inter-connected and integrated topics and focuses on designing integrated HISs and further developing statistical models and machine learning algorithms for process and patient care improvement.

Topic 1: Design of super-HIS and tracking of quality of care (QoC). My research developed an information technology that integrates multiple HISs in radiology, and proposed QoC metrics defined upon the data that measure various dimensions of care. The DDD assisted the clinical practices and enabled an effective intervention for reducing lengthy radiologist turnaround times for patients.

Topic 2: Monitoring and change detection of QoC data streams for process improvement. With the super-HIS in place, high-dimensional data streams of QoC metrics are generated. I developed a statistical model for monitoring high- dimensional data streams that integrated Singular Vector Decomposition (SVD) and process control. The algorithm was applied to QoC metrics data, and additionally extended to another application of monitoring traffic data in communication networks.

Topic 3: Deep transfer learning of archive HIS data for computer-aided diagnosis (CAD). The novelty of the CAD system is the development of a deep transfer learning algorithm that combines the ideas of transfer learning and multi- modality image integration under the deep learning framework. Our system achieved high accuracy in breast cancer diagnosis compared with conventional machine learning algorithms.
Date Created
2018
Agent

Development of Complementary Fresh-Food Systems Through the Exploration and Identification of Profit-Maximizing, Supply Chains

156106-Thumbnail Image.png
Description
One of the greatest 21st century challenges is meeting the needs of a growing world population expected to increase 35% by 2050 given projected trends in diets, consumption and income. This in turn requires a 70-100% improvement on current

One of the greatest 21st century challenges is meeting the needs of a growing world population expected to increase 35% by 2050 given projected trends in diets, consumption and income. This in turn requires a 70-100% improvement on current production capability, even as the world is undergoing systemic climate pattern changes. This growth not only translates to higher demand for staple products, such as rice, wheat, and beans, but also creates demand for high-value products such as fresh fruits and vegetables (FVs), fueled by better economic conditions and a more health conscious consumer. In this case, it would seem that these trends would present opportunities for the economic development of environmentally well-suited regions to produce high-value products. Interestingly, many regions with production potential still exhibit a considerable gap between their current and ‘true’ maximum capability, especially in places where poverty is more common. Paradoxically, often high-value, horticultural products could be produced in these regions, if relatively small capital investments are made and proper marketing and distribution channels are created. The hypothesis is that small farmers within local agricultural systems are well positioned to take advantage of existing sustainable and profitable opportunities, specifically in high-value agricultural production. Unearthing these opportunities can entice investments in small farming development and help them enter the horticultural industry, thus expand the volume, variety and/or quality of products available for global consumption. In this dissertation, the objective is three-fold: (1) to demonstrate the hidden production potential that exist within local agricultural communities, (2) highlight the importance of supply chain modeling tools in the strategic design of local agricultural systems, and (3) demonstrate the application of optimization and machine learning techniques to strategize the implementation of protective agricultural technologies.

As part of this dissertation, a yield approximation method is developed and integrated with a mixed-integer program to estimate a region’s potential to produce non-perennial, vegetable items. This integration offers practical approximations that help decision-makers identify technologies needed to protect agricultural production, alter harvesting patterns to better match market behavior, and provide an analytical framework through which external investment entities can assess different production options.
Date Created
2017
Agent

A Data Mining Approach to Modeling Customer Preference: A Case Study of Intel Corporation

156053-Thumbnail Image.png
Description
Understanding customer preference is crucial for new product planning and marketing decisions. This thesis explores how historical data can be leveraged to understand and predict customer preference. This thesis presents a decision support framework that provides a holistic view on

Understanding customer preference is crucial for new product planning and marketing decisions. This thesis explores how historical data can be leveraged to understand and predict customer preference. This thesis presents a decision support framework that provides a holistic view on customer preference by following a two-phase procedure. Phase-1 uses cluster analysis to create product profiles based on which customer profiles are derived. Phase-2 then delves deep into each of the customer profiles and investigates causality behind their preference using Bayesian networks. This thesis illustrates the working of the framework using the case of Intel Corporation, world’s largest semiconductor manufacturing company.
Date Created
2017
Agent

Network maintenance and capacity management with applications in transportation

155983-Thumbnail Image.png
Description
This research develops heuristics to manage both mandatory and optional network capacity reductions to better serve the network flows. The main application discussed relates to transportation networks, and flow cost relates to travel cost of users of the network. Temporary

This research develops heuristics to manage both mandatory and optional network capacity reductions to better serve the network flows. The main application discussed relates to transportation networks, and flow cost relates to travel cost of users of the network. Temporary mandatory capacity reductions are required by maintenance activities. The objective of managing maintenance activities and the attendant temporary network capacity reductions is to schedule the required segment closures so that all maintenance work can be completed on time, and the total flow cost over the maintenance period is minimized for different types of flows. The goal of optional network capacity reduction is to selectively reduce the capacity of some links to improve the overall efficiency of user-optimized flows, where each traveler takes the route that minimizes the traveler’s trip cost. In this dissertation, both managing mandatory and optional network capacity reductions are addressed with the consideration of network-wide flow diversions due to changed link capacities.

This research first investigates the maintenance scheduling in transportation networks with service vehicles (e.g., truck fleets and passenger transport fleets), where these vehicles are assumed to take the system-optimized routes that minimize the total travel cost of the fleet. This problem is solved with the randomized fixed-and-optimize heuristic developed. This research also investigates the maintenance scheduling in networks with multi-modal traffic that consists of (1) regular human-driven cars with user-optimized routing and (2) self-driving vehicles with system-optimized routing. An iterative mixed flow assignment algorithm is developed to obtain the multi-modal traffic assignment resulting from a maintenance schedule. The genetic algorithm with multi-point crossover is applied to obtain a good schedule.

Based on the Braess’ paradox that removing some links may alleviate the congestion of user-optimized flows, this research generalizes the Braess’ paradox to reduce the capacity of selected links to improve the efficiency of the resultant user-optimized flows. A heuristic is developed to identify links to reduce capacity, and the corresponding capacity reduction amounts, to get more efficient total flows. Experiments on real networks demonstrate the generalized Braess’ paradox exists in reality, and the heuristic developed solves real-world test cases even when commercial solvers fail.
Date Created
2017
Agent

MRI-Based Texture Analysis to Differentiate Sinonasal Squamous Cell Carcinoma from Inverted Papilloma

Description
ABSTRACT BACKGROUND AND PURPOSE: Sinonasal inverted papilloma (IP) can harbor squamous cell carcinoma (SCC). Consequently, differentiating these tumors is important. The objective of this study was to determine if MRI-based texture analysis can differentiate SCC from IP and provide supplementary

ABSTRACT BACKGROUND AND PURPOSE: Sinonasal inverted papilloma (IP) can harbor squamous cell carcinoma (SCC). Consequently, differentiating these tumors is important. The objective of this study was to determine if MRI-based texture analysis can differentiate SCC from IP and provide supplementary information to the radiologist. MATERIALS AND METHODS: Adult patients who had IP or SCC resected were eligible (coexistent IP and SCC were excluded). Inclusion required tumor size greater than 1.5 cm and a pre-operative MRI with axial T1, axial T2, and axial T1 post-contrast sequences. Five well- established texture analysis algorithms were applied to an ROI from the largest tumor cross- section. For a training dataset, machine-learning algorithms were used to identify the most accurate model, and performance was also evaluated in a validation dataset. Based on three separate blinded reviews of the ROI, isolated tumor, and entire images, two neuroradiologists predicted tumor type in consensus. RESULTS: The IP and SCC cohorts were matched for age and gender, while SCC tumor volume was larger (p=0.001). The best classification model achieved similar accuracies for training (17 SCC, 16 IP) and validation (7 SCC, 6 IP) datasets of 90.9% and 84.6% respectively (p=0.537). The machine-learning accuracy for the entire cohort (89.1%) was better than that of the neuroradiologist ROI review (56.5%, p=0.0004) but not significantly different from the neuroradiologist review of the tumors (73.9%, p=0.060) or entire images (87.0%, p=0.748). CONCLUSION: MRI-based texture analysis has potential to differentiate SCC from IP and may provide incremental information to the neuroradiologist, particularly for small or heterogeneous tumors.
Date Created
2016-12
Agent

Open-Source Feature Selection Tool for Medical Imaging Diagnosis

134706-Thumbnail Image.png
Description
Open source image analytics and data mining software are widely available but can be overly-complicated and non-intuitive for medical physicians and researchers to use. The ASU-Mayo Clinic Imaging Informatics Lab has developed an in-house pipeline to process medical images, extract

Open source image analytics and data mining software are widely available but can be overly-complicated and non-intuitive for medical physicians and researchers to use. The ASU-Mayo Clinic Imaging Informatics Lab has developed an in-house pipeline to process medical images, extract imaging features, and develop multi-parametric models to assist disease staging and diagnosis. The tools have been extensively used in a number of medical studies including brain tumor, breast cancer, liver cancer, Alzheimer's disease, and migraine. Recognizing the need from users in the medical field for a simplified interface and streamlined functionalities, this project aims to democratize this pipeline so that it is more readily available to health practitioners and third party developers.
Date Created
2016-12
Agent

Multiple-Channel Detection in Active Sensing

Description
The problem of detecting the presence of a known signal in multiple channels of additive white Gaussian noise, such as occurs in active radar with a single transmitter and multiple geographically distributed receivers, is addressed via coherent multiple-channel techniques. A

The problem of detecting the presence of a known signal in multiple channels of additive white Gaussian noise, such as occurs in active radar with a single transmitter and multiple geographically distributed receivers, is addressed via coherent multiple-channel techniques. A replica of the transmitted signal replica is treated as a one channel in a M-channel detector with the remaining M-1 channels comprised of data from the receivers. It is shown that the distribution of the eigenvalues of a Gram matrix are invariant to the presence of the signal replica on one channel provided the other M-1 channels are independent and contain only white Gaussian noise. Thus, the thresholds representing false alarm probabilities for detectors based on functions of these eigenvalues remain valid when one channel is known to not contain only noise. The derivation is supported by results from Monte Carlo simulations. The performance of the largest eigenvalue as a detection statistic in the active case is examined, and compared to the normalized matched filter detector in a two and three channel case.
Date Created
2013-05
Agent