Recursive Bayesian Estimation on Projective Spaces: Theoretical Foundations and Practical Algorithms

168276-Thumbnail Image.png
Description
This thesis develops geometrically and statistically rigorous foundations for multivariate analysis and bayesian inference posed on grassmannian manifolds. Requisite to the development of key elements of statistical theory in a geometric realm are closed-form, analytic expressions for many differential geometric

This thesis develops geometrically and statistically rigorous foundations for multivariate analysis and bayesian inference posed on grassmannian manifolds. Requisite to the development of key elements of statistical theory in a geometric realm are closed-form, analytic expressions for many differential geometric objects, e.g., tangent vectors, metrics, geodesics, volume forms. The first part of this thesis is devoted to a mathematical exposition of these. In particular, it leverages the classical work of Alan James to derive the exterior calculus of differential forms on special grassmannians for invariant measures with respect to which integration is permissible. Motivated by various multi-­sensor remote sensing applications, the second part of this thesis describes the problem of recursively estimating the state of a dynamical system propagating on the Grassmann manifold. Fundamental to the bayesian treatment of this problem is the choice of a suitable probability distribution to a priori model the state. Using the Method of Maximum Entropy, a derivation of maximum-­entropy probability distributions on the state space that uses the developed geometric theory is characterized. Statistical analyses of these distributions, including parameter estimation, are also presented. These probability distributions and the statistical analysis thereof are original contributions. Using the bayesian framework, two recursive estimation algorithms, both of which rely on noisy measurements on (special cases of) the Grassmann manifold, are the devised and implemented numerically. The first is applied to an idealized scenario, the second to a more practically motivated scenario. The novelty of both of these algorithms lies in the use of thederived maximum­entropy probability measures as models for the priors. Numerical simulations demonstrate that, under mild assumptions, both estimation algorithms produce accurate and statistically meaningful outputs. This thesis aims to chart the interface between differential geometry and statistical signal processing. It is my deepest hope that the geometric-statistical approach underlying this work facilitates and encourages the development of new theories and new computational methods in geometry. Application of these, in turn, will bring new insights and bettersolutions to a number of extant and emerging problems in signal processing.
Date Created
2021
Agent

An Inquiry on the Philosophical Ideas of the Sublime and Beautiful and their Applications in the Quantum World

165054-Thumbnail Image.png
Description

In 1757 Edmund Burke published A Philosophical Enquiry into the Sublime and Beautiful. I will be extending his analysis of the sublime and beautiful, and using it to dissect quantum mechanics. Using Burke’s template on the sublime and beautiful, I

In 1757 Edmund Burke published A Philosophical Enquiry into the Sublime and Beautiful. I will be extending his analysis of the sublime and beautiful, and using it to dissect quantum mechanics. Using Burke’s template on the sublime and beautiful, I can evaluate experiments in quantum mechanics, and explore a new side of Burke’s aesthetic theory. For the reader, I have outlined Burke’s aesthetic theory on the sublime and beautiful. I then used this analysis to explore quantum mechanics and assess the components of quantum mechanics that are beautiful and sublime.

Date Created
2022-05
Agent

Evaluating the utility of blood glycan levels as predictors of stage I adenocarcinoma using support vector machines.

135360-Thumbnail Image.png
Description
Aberrant glycosylation has been shown to be linked to specific cancers, and using this idea, it was proposed that the levels of glycans in the blood could predict stage I adenocarcinoma. To track this glycosylation, glycan were broken down into

Aberrant glycosylation has been shown to be linked to specific cancers, and using this idea, it was proposed that the levels of glycans in the blood could predict stage I adenocarcinoma. To track this glycosylation, glycan were broken down into glycan nodes via methylation analysis. This analysis utilized information from N-, O-, and lipid linked glycans detected from gas chromatography-mass spectrometry. The resulting glycan node-ratios represent the initial quantitative data that were used in this experiment.
For this experiment, two Sets of 50 µl blood plasma samples were provided by NYU Medical School. These samples were then analyzed by Dr. Borges’s lab so that they contained normalized biomarker levels from patients with stage 1 adenocarcinoma and control patients with matched age, smoking status, and gender were examined. An ROC curve was constructed under individual and paired conditions and AUC calculated in Wolfram Mathematica 10.2. Methods such as increasing size of training set, using hard vs. soft margins, and processing biomarkers together and individually were used in order to increase the AUC. Using a soft margin for this particular data set was proved to be most useful compared to the initial set hard margin, raising the AUC from 0.6013 to 0.6585. In regards to which biomarkers yielded the better value, 6-Glc/6-Man and 3,6-Gal glycan node ratios had the best with 0.7687 AUC and a sensitivity of .7684 and specificity of .6051. While this is not enough accuracy to become a primary diagnostic tool for diagnosing stage I adenocarcinoma, the methods examined in the paper should be evaluated further. . By comparison, the current clinical standard blood test for prostate cancer that has an AUC of only 0.67.
Date Created
2016-05
Agent

Development and analysis of stochastic boundary coverage strategies for multi-robot systems

154497-Thumbnail Image.png
Description
Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains and time scales. Many of these tasks will require

Robotic technology is advancing to the point where it will soon be feasible to deploy massive populations, or swarms, of low-cost autonomous robots to collectively perform tasks over large domains and time scales. Many of these tasks will require the robots to allocate themselves around the boundaries of regions or features of interest and achieve target objectives that derive from their resulting spatial configurations, such as forming a connected communication network or acquiring sensor data around the entire boundary. We refer to this spatial allocation problem as boundary coverage. Possible swarm tasks that will involve boundary coverage include cooperative load manipulation for applications in construction, manufacturing, and disaster response.

In this work, I address the challenges of controlling a swarm of resource-constrained robots to achieve boundary coverage, which I refer to as the problem of stochastic boundary coverage. I first examined an instance of this behavior in the biological phenomenon of group food retrieval by desert ants, and developed a hybrid dynamical system model of this process from experimental data. Subsequently, with the aid of collaborators, I used a continuum abstraction of swarm population dynamics, adapted from a modeling framework used in chemical kinetics, to derive stochastic robot control policies that drive a swarm to target steady-state allocations around multiple boundaries in a way that is robust to environmental variations.

Next, I determined the statistical properties of the random graph that is formed by a group of robots, each with the same capabilities, that have attached to a boundary at random locations. I also computed the probability density functions (pdfs) of the robot positions and inter-robot distances for this case.

I then extended this analysis to cases in which the robots have heterogeneous communication/sensing radii and attach to a boundary according to non-uniform, non-identical pdfs. I proved that these more general coverage strategies generate random graphs whose probability of connectivity is Sharp-P Hard to compute. Finally, I investigated possible approaches to validating our boundary coverage strategies in multi-robot simulations with realistic Wi-fi communication.
Date Created
2016
Agent

Statistical and dynamical modeling of Riemannian trajectories with application to human movement analysis

154471-Thumbnail Image.png
Description
The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information,

The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.
Date Created
2016
Agent

Statistical signal processing of ESI-TOF-MS for biomarker discovery

151436-Thumbnail Image.png
Description
Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these applications require extraction of a signal or parameter of interest

Signal processing techniques have been used extensively in many engineering problems and in recent years its application has extended to non-traditional research fields such as biological systems. Many of these applications require extraction of a signal or parameter of interest from degraded measurements. One such application is mass spectrometry immunoassay (MSIA) which has been one of the primary methods of biomarker discovery techniques. MSIA analyzes protein molecules as potential biomarkers using time of flight mass spectrometry (TOF-MS). Peak detection in TOF-MS is important for biomarker analysis and many other MS related application. Though many peak detection algorithms exist, most of them are based on heuristics models. One of the ways of detecting signal peaks is by deploying stochastic models of the signal and noise observations. Likelihood ratio test (LRT) detector, based on the Neyman-Pearson (NP) lemma, is an uniformly most powerful test to decision making in the form of a hypothesis test. The primary goal of this dissertation is to develop signal and noise models for the electrospray ionization (ESI) TOF-MS data. A new method is proposed for developing the signal model by employing first principles calculations based on device physics and molecular properties. The noise model is developed by analyzing MS data from careful experiments in the ESI mass spectrometer. A non-flat baseline in MS data is common. The reasons behind the formation of this baseline has not been fully comprehended. A new signal model explaining the presence of baseline is proposed, though detailed experiments are needed to further substantiate the model assumptions. Signal detection schemes based on these signal and noise models are proposed. A maximum likelihood (ML) method is introduced for estimating the signal peak amplitudes. The performance of the detection methods and ML estimation are evaluated with Monte Carlo simulation which shows promising results. An application of these methods is proposed for fractional abundance calculation for biomarker analysis, which is mathematically robust and fundamentally different than the current algorithms. Biomarker panels for type 2 diabetes and cardiovascular disease are analyzed using existing MS analysis algorithms. Finally, a support vector machine based multi-classification algorithm is developed for evaluating the biomarkers' effectiveness in discriminating type 2 diabetes and cardiovascular diseases and is shown to perform better than a linear discriminant analysis based classifier.
Date Created
2012
Agent

Bayesian networks and gaussian mixture models in multi-dimensional data analysis with application to religion-conflict data

150929-Thumbnail Image.png
Description
This thesis examines the application of statistical signal processing approaches to data arising from surveys intended to measure psychological and sociological phenomena underpinning human social dynamics. The use of signal processing methods for analysis of signals arising from measurement of

This thesis examines the application of statistical signal processing approaches to data arising from surveys intended to measure psychological and sociological phenomena underpinning human social dynamics. The use of signal processing methods for analysis of signals arising from measurement of social, biological, and other non-traditional phenomena has been an important and growing area of signal processing research over the past decade. Here, we explore the application of statistical modeling and signal processing concepts to data obtained from the Global Group Relations Project, specifically to understand and quantify the effects and interactions of social psychological factors related to intergroup conflicts. We use Bayesian networks to specify prospective models of conditional dependence. Bayesian networks are determined between social psychological factors and conflict variables, and modeled by directed acyclic graphs, while the significant interactions are modeled as conditional probabilities. Since the data are sparse and multi-dimensional, we regress Gaussian mixture models (GMMs) against the data to estimate the conditional probabilities of interest. The parameters of GMMs are estimated using the expectation-maximization (EM) algorithm. However, the EM algorithm may suffer from over-fitting problem due to the high dimensionality and limited observations entailed in this data set. Therefore, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) are used for GMM order estimation. To assist intuitive understanding of the interactions of social variables and the intergroup conflicts, we introduce a color-based visualization scheme. In this scheme, the intensities of colors are proportional to the conditional probabilities observed.
Date Created
2012
Agent

Micro-particle streak velocimetry: theory, simulation methods and applications

150439-Thumbnail Image.png
Description
This dissertation describes a novel, low cost strategy of using particle streak (track) images for accurate micro-channel velocity field mapping. It is shown that 2-dimensional, 2-component fields can be efficiently obtained using the spatial variation of particle track lengths in

This dissertation describes a novel, low cost strategy of using particle streak (track) images for accurate micro-channel velocity field mapping. It is shown that 2-dimensional, 2-component fields can be efficiently obtained using the spatial variation of particle track lengths in micro-channels. The velocity field is a critical performance feature of many microfluidic devices. Since it is often the case that un-modeled micro-scale physics frustrates principled design methodologies, particle based velocity field estimation is an essential design and validation tool. Current technologies that achieve this goal use particle constellation correlation strategies and rely heavily on costly, high-speed imaging hardware. The proposed image/ video processing based method achieves comparable accuracy for fraction of the cost. In the context of micro-channel velocimetry, the usability of particle streaks has been poorly studied so far. Their use has remained restricted mostly to bulk flow measurements and occasional ad-hoc uses in microfluidics. A second look at the usability of particle streak lengths in this work reveals that they can be efficiently used, after approximately 15 years from their first use for micro-channel velocimetry. Particle tracks in steady, smooth microfluidic flows is mathematically modeled and a framework for using experimentally observed particle track lengths for local velocity field estimation is introduced here, followed by algorithm implementation and quantitative verification. Further, experimental considerations and image processing techniques that can facilitate the proposed methods are also discussed in this dissertation. Unavailability of benchmarked particle track image data motivated the implementation of a simulation framework with the capability to generate exposure time controlled particle track image sequence for velocity vector fields. This dissertation also describes this work and shows that arbitrary velocity fields designed in computational fluid dynamics software tools can be used to obtain such images. Apart from aiding gold-standard data generation, such images would find use for quick microfluidic flow field visualization and help improve device designs.
Date Created
2011
Agent

Characterization and analysis of a novel platform for profiling the antibody response

150250-Thumbnail Image.png
Description
Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to

Immunosignaturing is a new immunodiagnostic technology that uses random-sequence peptide microarrays to profile the humoral immune response. Though the peptides have little sequence homology to any known protein, binding of serum antibodies may be detected, and the pattern correlated to disease states. The aim of my dissertation is to analyze the factors affecting the binding patterns using monoclonal antibodies and determine how much information may be extracted from the sequences. Specifically, I examined the effects of antibody concentration, competition, peptide density, and antibody valence. Peptide binding could be detected at the low concentrations relevant to immunosignaturing, and a monoclonal's signature could even be detected in the presences of 100 fold excess naive IgG. I also found that peptide density was important, but this effect was not due to bivalent binding. Next, I examined in more detail how a polyreactive antibody binds to the random sequence peptides compared to protein sequence derived peptides, and found that it bound to many peptides from both sets, but with low apparent affinity. An in depth look at how the peptide physicochemical properties and sequence complexity revealed that there were some correlations with properties, but they were generally small and varied greatly between antibodies. However, on a limited diversity but larger peptide library, I found that sequence complexity was important for antibody binding. The redundancy on that library did enable the identification of specific sub-sequences recognized by an antibody. The current immunosignaturing platform has little repetition of sub-sequences, so I evaluated several methods to infer antibody epitopes. I found two methods that had modest prediction accuracy, and I developed a software application called GuiTope to facilitate the epitope prediction analysis. None of the methods had sufficient accuracy to identify an unknown antigen from a database. In conclusion, the characteristics of the immunosignaturing platform observed through monoclonal antibody experiments demonstrate its promise as a new diagnostic technology. However, a major limitation is the difficulty in connecting the signature back to the original antigen, though larger peptide libraries could facilitate these predictions.
Date Created
2011
Agent

Opportunistic scheduling, cooperative relaying and multicast in wireless networks

149544-Thumbnail Image.png
Description
This dissertation builds a clear understanding of the role of information in wireless networks, and devises adaptive strategies to optimize the overall performance. The meaning of information ranges from channel
etwork states to the structure of the signal itself. Under the

This dissertation builds a clear understanding of the role of information in wireless networks, and devises adaptive strategies to optimize the overall performance. The meaning of information ranges from channel
etwork states to the structure of the signal itself. Under the common thread of characterizing the role of information, this dissertation investigates opportunistic scheduling, relaying and multicast in wireless networks. To assess the role of channel state information, the problem of opportunistic distributed opportunistic scheduling (DOS) with incomplete information is considered for ad-hoc networks in which many links contend for the same channel using random access. The objective is to maximize the system throughput. In practice, link state information is noisy, and may result in throughput degradation. Therefore, refining the state information by additional probing can improve the throughput, but at the cost of further probing. Capitalizing on optimal stopping theory, the optimal scheduling policy is shown to be threshold-based and is characterized by either one or two thresholds, depending on network settings. To understand the benefits of side information in cooperative relaying scenarios, a basic model is explored for two-hop transmissions of two information flows which interfere with each other. While the first hop is a classical interference channel, the second hop can be treated as an interference channel with transmitter side information. Various cooperative relaying strategies are developed to enhance the achievable rate. In another context, a simple sensor network is considered, where a sensor node acts as a relay, and aids fusion center in detecting an event. Two relaying schemes are considered: analog relaying and digital relaying. Sufficient conditions are provided for the optimality of analog relaying over digital relaying in this network. To illustrate the role of information about the signal structure in joint source-channel coding, multicast of compressible signals over lossy channels is studied. The focus is on the network outage from the perspective of signal distortion across all receivers. Based on extreme value theory, the network outage is characterized in terms of key parameters. A new method using subblock network coding is devised, which prioritizes resource allocation based on the signal information structure.
Date Created
2011
Agent