Yuksel Splines for Probabilistic Sequence Prediction

193596-Thumbnail Image.png
Description
Current methods for sequence prediction often fail to account for higher-ordercontinuity. This results in the prediction of sequences that might be continuous but not physically viable as I investigate higher-order smoothness in terms of velocity and acceleration. Hence, I propose a Yuksel

Current methods for sequence prediction often fail to account for higher-ordercontinuity. This results in the prediction of sequences that might be continuous but not physically viable as I investigate higher-order smoothness in terms of velocity and acceleration. Hence, I propose a Yuksel Spline-based model that is not only capable of predicting curves that are guaranteed to be C^2 continuous but, also efficient to compute as well. Characteristic properties of the models are demonstrated over toy examples and sequence prediction tasks.
Date Created
2024
Agent

On Stochastic Modeling Applications to Cybersecurity: Loss, Attack, and Detection

189358-Thumbnail Image.png
Description
The main objective of this work is to study novel stochastic modeling applications to cybersecurity aspects across three dimensions: Loss, attack, and detection. First, motivated by recent spatial stochastic models with cyber insurance applications, the first and second moments of

The main objective of this work is to study novel stochastic modeling applications to cybersecurity aspects across three dimensions: Loss, attack, and detection. First, motivated by recent spatial stochastic models with cyber insurance applications, the first and second moments of the size of a typical cluster of bond percolation on finite graphs are studied. More precisely, having a finite graph where edges are independently open with the same probability $p$ and a vertex $x$ chosen uniformly at random, the goal is to find the first and second moments of the number of vertices in the cluster of open edges containing $x$. Exact expressions for the first and second moments of the size distribution of a bond percolation cluster on essential building blocks of hybrid graphs: the ring, the path, the random star, and regular graphs are derived. Upper bounds for the moments are obtained by using a coupling argument to compare the percolation model with branching processes when the graph is the random rooted tree with a given offspring distribution and a given finite radius. Second, the Petri Net modeling framework for performance analysis is well established; extensions provide enough flexibility to examine the behavior of a permissioned blockchain platform in the context of an ongoing cyberattack via simulation. The relationship between system performance and cyberattack configuration is analyzed. The simulations vary the blockchain's parameters and network structure, revealing the factors that contribute positively or negatively to a Sybil attack through the performance impact of the system. Lastly, the denoising diffusion probabilistic models (DDPM) ability for synthetic tabular data augmentation is studied. DDPMs surpass generative adversarial networks in improving computer vision classification tasks and image generation, for example, stable diffusion. Recent research and open-source implementations point to a strong quality of synthetic tabular data generation for classification and regression tasks. Unfortunately, the present state of literature concerning tabular data augmentation with DDPM for classification is lacking. Further, cyber datasets commonly have highly unbalanced distributions complicating training. Synthetic tabular data augmentation is investigated with cyber datasets and performance of well-known metrics in machine learning classification tasks improve with augmentation and balancing.
Date Created
2023
Agent

Comparative Analysis of the Hypergraph Transfer Protocol & Layer-2 Scalability Protocol

Description

Through my work with the Arizona State University Blockchain Research Lab (BRL) and JennyCo, one of the first Healthcare Information (HCI) HIPAA - compliant decentralized exchanges, I have had the opportunity to explore a unique cross-section of some of the

Through my work with the Arizona State University Blockchain Research Lab (BRL) and JennyCo, one of the first Healthcare Information (HCI) HIPAA - compliant decentralized exchanges, I have had the opportunity to explore a unique cross-section of some of the most up and coming DLTs including both DAGs and blockchains. During this research, four major technologies (including JennyCo’s own systems) presented themselves as prime candidates for the comparative analysis of two models for implementing JennyCo’s system architecture for the monetization of healthcare information exchanges (HIEs). These four identified technologies and their underlying mechanisms will be explored thoroughly throughout the course of this paper and are listed with brief definitions as follows: Polygon - “Polygon is a “layer two” or “sidechain” scaling solution that runs alongside the Ethereum blockchain. MATIC is the network’s native cryptocurrency, which is used for fees, staking, and more” [8]. Polygon is the scalable layer involved in the L2SP architecture. Ethereum - “Ethereum is a decentralized blockchain platform that establishes a peer-to-peer network that securely executes and verifies application code, called smart contracts.” [9] This foundational Layer-1 runs thousands of nodes and creates a unique decentralized ecosystem governed by turing complete automated programs. Ethereum is the foundational Layer involved in the L2SP. Constellation - A novel Layer-0 data-centric peer-to-peer network that utilizes the “Hypergraph Transfer Protocol or HGTP, a DLT known as a [DAG] protocol with a novel reputation-based consensus model called Proof of Reputable Observation (PRO). Hypergraph is a feeless decentralized network that supports the transfer of $DAG cryptocurrency.” [10] JennyCo Protocol - Acts as a HIPAA compliant decentralized HIE by allowing consumers, big businesses, and brands to access and exchange user health data on a secure, interoperable, and accessible platform via DLT. The JennyCo Protocol implements utility tokens to reward buyers and sellers for exchanging data. Its protocol nature comes from its DLT implementation which governs the functioning of on-chain actions (e.g. smart contracts). In this case, these actions consist of secure and transparent health data exchange and monetization to reconstitute data ownership to those who generate that data [11]. With the direct experience of working closely with multiple companies behind the technologies listed, I have been exposed to the benefits and deficits of each of these technologies and their corresponding approaches. In this paper, I will use my experience with these technologies and their frameworks to explore two distributed ledger architecture protocols in order to determine the more effective model for implementing JennycCo’s health data exchange. I will begin this paper with an exploration of blockchain and directed acyclic graph (DAG) technologies to better understand their innate architectures and features. I will then move to an in-depth look at layered protocols, and healthcare data in the form of EHRs. Additionally, I will address the main challenges EHRs and HIEs face to present a deeper understanding of the challenges JennyCo is attempting to address. Finally, I will demonstrate my hypothesis: the Hypergraph Transfer Protocol (HGTP) model by Constellation presents significant advantages in scalability, interoperability, and external data security over the Layer-2 Scalability Protocol (L2SP) used by Polygon and Ethereum in implementing the JennyCo protocol. This will be done through a thorough breakdown of each protocol along with an analysis of relevant criteria including but not limited to: security, interoperability, and scalability. In doing so, I hope to determine the best framework for running JennyCo’s HIE Protocol.

Date Created
2023-05
Agent

A Comparative Analysis of Bitcoin Price Prediction Models

171864-Thumbnail Image.png
Description
Bitcoin (BTC) shares many characteristics with traditional stocks, but it is much more volatile since the cryptocurrency market is unregulated. The high volatility makes BTC a very high risk-high reward investment and predictive analysis can be very useful to obtain

Bitcoin (BTC) shares many characteristics with traditional stocks, but it is much more volatile since the cryptocurrency market is unregulated. The high volatility makes BTC a very high risk-high reward investment and predictive analysis can be very useful to obtain good returns and minimize risk. Taking Cocco et al. [1] as the primary reference, this thesis tries to reproduce their findings by building two BTC price forecasting models, Long Short-Term Memory (LSTM) and Bayesian Neural Network (BNN), and finding that the Mean Absolute Percentage Error (MAPE) is lower for the initial BNN model in comparison to the initial LSTM model. In addition to forecasting the value of BTC, a metric called trend% is developed to gauge the models’ ability to capture the trend of how the price varies from one timestep to the next and used to compare the trend prediction performance. It is found that both initial models make random predictions for the trend. Improvements like removing the stochastic component from the data and forecasting returns as opposed to price values show that both models show comparable performance in terms of both MAPE and trend%. The thesis concludes by discussing the future work that can be done to potentially improve the above models. One of the possibilities mentioned is to use on-chain data from the BTC blockchain coupled with the real-world knowledge of BTC exchanges and feed this as input features to the models.
Date Created
2022
Agent

SATLAB - An End to End framework for Labelling Satellite Images

168532-Thumbnail Image.png
Description
In this work, I propose a novel, unsupervised framework titled SATLAB, to label satellite images, given a classification task at hand. Existing models for satellite image classification such as DeepSAT and DeepSAT-V2 rely on deep learning models that are label-hungry

In this work, I propose a novel, unsupervised framework titled SATLAB, to label satellite images, given a classification task at hand. Existing models for satellite image classification such as DeepSAT and DeepSAT-V2 rely on deep learning models that are label-hungry and require a significant amount of training data. Since manual curation of labels is expensive, I ensure that SATLAB requires zero training labels. SATLAB can work in conjunction with several generative and unsupervised machine learning models by allowing them to be seamlessly plugged into its architecture. I devise three operating modes for SATLAB - manual, semi-automatic and automatic which require varying levels of human intervention in creating the domain-specific labeling functions for each image that can be utilized by the candidate generative models such as Snorkel, as well as other unsupervised learners in SATLAB. Unlike existing supervised learning baselines which only extract textural features from satellite images, I support the extraction of both textural and geospatial features in SATLAB, and I empirically show that geospatial features enhance the classification F1-score by 33%. I build SATLAB on the top of Apache Sedona in order to leverage its rich set of spatial query processing operators for the extraction of geospatial features from satellite raster images. I evaluate SATLAB on a target binary classification task that distinguishes slum from non-slum areas, upon a repository of 100K satellite images captured by the Sentinel satellite program. My 5-Fold Cross Validation (CV) experiments show that SATLAB achieves competitive F1-scores (0.6) using 0% labeled data while the best baseline supervised learning baseline achieves 0.74 F1-score using 80% labeled data. I also show that Snorkel outperforms alternative generative and unsupervised candidate models that can be plugged into SATLAB by 33% to 71% w.r.t. F1-score and 3 times to 73 times w.r.t. latency. I also show that downstream classifiers trained using the labels generated by SATLAB are comparable in quality (0.63 F1) to their counterpart classifiers (0.74 F1) trained on manually curated labels.
Date Created
2022
Agent

Novel NFT Minter: With Support From NuCypher

164866-Thumbnail Image.png
Description

This project aims to mint NFT's on the Ethereum blockchain with upgraded functionality. This functionality helps user verifiability and increases a user's control over their NFT.

Date Created
2022-05
Agent

GEM: An Efficient Entity Matching Framework for Geospatial Data

161829-Thumbnail Image.png
Description
The use of spatial data has become very fundamental in today's world. Ranging from fitness trackers to food delivery services, almost all application records users' location information and require clean geospatial data to enhance various application features. As spatial data

The use of spatial data has become very fundamental in today's world. Ranging from fitness trackers to food delivery services, almost all application records users' location information and require clean geospatial data to enhance various application features. As spatial data flows in from heterogeneous sources various problems arise. The study of entity matching has been a fervent step in the process of producing clean usable data. Entity matching is an amalgamation of various sub-processes including blocking and matching. At the end of an entity matching pipeline, we get deduplicated records of the same real-world entity. Identifying various mentions of the same real-world locations is known as spatial entity matching. While entity matching received significant interest in the field of relational entity matching, the same cannot be said about spatial entity matching. In this dissertation, I build an end-to-end Geospatial Entity Matching framework, GEM, exploring spatial entity matching from a novel perspective. In the current state-of-the-art systems spatial entity matching is only done on one type of geometrical data variant. Instead of confining to matching spatial entities of only point geometry type, I work on extending the boundaries of spatial entity matching to match the more generic polygon geometry entities as well. I propose a methodology to provide support for three entity matching scenarios across different geometrical data types: point X point, point X polygon, polygon X polygon. As mentioned above entity matching consists of various steps but blocking, feature vector creation, and classification are the core steps of the system. GEM comprises an efficient and lightweight blocking technique, GeoPrune, that uses the geohash encoding mechanism to prune away the obvious non-matching spatial entities. Geohashing is a technique to convert a point location coordinates to an alphanumeric code string. This technique proves to be very effective and swift for the blocking mechanism. I leverage the Apache Sedona engine to create the feature vectors. Apache Sedona is a spatial database management system that holds the capacity of processing spatial SQL queries with multiple geometry types without compromising on their original coordinate vector representation. In this step, I re-purpose the spatial proximity operators (SQL queries) in Apache Sedona to create spatial feature dimensions that capture the proximity between a geospatial entity pair. The last step of an entity matching process is matching or classification. The classification step in GEM is a pluggable component, which consumes the feature vector for a spatial entity pair and determines whether the geolocations match or not. The component provides 3 machine learning models that consume the same feature vector and provide a label for the test data based on the training. I conduct experiments with the three classifiers upon multiple large-scale geospatial datasets consisting of both spatial and relational attributes. Data considered for experiments arrives from heterogeneous sources and we pre-align its schema manually. GEM achieves an F-measure of 1.0 for a point X point dataset with 176k total pairs, which is 42% higher than a state-of-the-art spatial EM baseline. It achieves F-measures of 0.966 and 0.993 for the point X polygon dataset with 302M total pairs, and the polygon X polygon dataset with 16M total pairs respectively.
Date Created
2021
Agent

Stability and Security of Distribution Networks with High-Penetration Renewables

161802-Thumbnail Image.png
Description
Rapid increases in the installed amounts of Distributed Energy Resources are forcing a paradigm shift to guarantee stability, security, and economics of power distribution systems. This dissertation explores these challenges and proposes solutions to enable higher penetrations of grid-edge devices.

Rapid increases in the installed amounts of Distributed Energy Resources are forcing a paradigm shift to guarantee stability, security, and economics of power distribution systems. This dissertation explores these challenges and proposes solutions to enable higher penetrations of grid-edge devices. The thesis shows that integrating Graph Signal Processing with State Estimation formulation allows accurate estimation of voltage phasors for radial feeders under low-observability conditions using traditional measurements. Furthermore, the Optimal Power Flow formulation presented in this work can reduce the solution time of a bus injection-based convex relaxation formulation, as shown through numerical results. The enhanced real-time knowledge of the system state is leveraged to develop new approaches to cyber-security of a transactive energy market by introducing a blockchain-based Electron Volt Exchange framework that includes a distributed protocol for pricing and scheduling prosumers' production/consumption while keeping constraints and bids private. The distributed algorithm prevents power theft and false data injection by comparing prosumers' reported power exchanges to models of expected power exchanges using measurements from grid sensors to estimate system state. Necessary hardware security is described and integrated into underlying grid-edge devices to verify the provenance of messages to and from these devices. These preventive measures for securing energy transactions are accompanied by additional mitigation measures to maintain voltage stability in inverter-dominated networks by expressing local control actions through Lyapunov analysis to mitigate cyber-attack and generation intermittency effects. The proposed formulation is applicable as long as the Volt-Var and Volt-Watt curves of the inverters can be represented as Lipschitz constants. Simulation results demonstrate how smart inverters can mitigate voltage oscillations throughout the distribution network. Approaches are rigorously explored and validated using a combination of real distribution networks and synthetic test cases. Finally, to overcome the scarcity of real data to test distribution systems algorithms a framework is introduced to generate synthetic distribution feeders mapped to real geospatial topologies using available OpenStreetMap data. The methods illustrate how to create synthetic feeders across the entire ZIP Code, with minimal input data for any location. These stackable scientific findings conclude with a brief discussion of physical deployment opportunities to accelerate grid modernization efforts.
Date Created
2021
Agent

A Verifiable Distributed Voting System Without a Trusted Party

161779-Thumbnail Image.png
Description
Cryptographic voting systems such as Helios rely heavily on a trusted party to maintain privacy or verifiability. This tradeoff can be done away with by using distributed substitutes for the components that need a trusted party. By replacing the encryption,

Cryptographic voting systems such as Helios rely heavily on a trusted party to maintain privacy or verifiability. This tradeoff can be done away with by using distributed substitutes for the components that need a trusted party. By replacing the encryption, shuffle, and decryption steps described by Helios with the Pedersen threshold encryption and Neff shuffle, it is possible to obtain a distributed voting system which achieves both privacy and verifiability without trusting any of the contributors. This thesis seeks to examine existing approaches to this problem, and their shortcomings. It provides empirical metrics for comparing different working solutions in detail.
Date Created
2021
Agent

Medical Devices Digital Threads and their Supply-Chain Management on Blockchain

161388-Thumbnail Image.png
Description
Blockchain technology is defined as a decentralized, distributed ledger recording the origin of a digital asset and all of its updates without the need of any governing authority. In Supply-Chain Management, Blockchain can be used very effectively, leading to a

Blockchain technology is defined as a decentralized, distributed ledger recording the origin of a digital asset and all of its updates without the need of any governing authority. In Supply-Chain Management, Blockchain can be used very effectively, leading to a more open and reliable supply chain. In recent years, different companies have begun to use blockchain to build blockchain-based supply chain solutions. Blockchain has been shown to help provide improved transparency across the supply chain. This research focuses on the supply chain management of medical devices and supplies using blockchain technology. These devices are manufactured by the authorized device manufacturers and are supplied to the different healthcare institutions on their demand. This entire process becomes vulnerable as there is no track of individual product once it gets shipped till it gets used. Traceability of medical devices in this scenario is hardly efficient and not trustworthy. To address this issue, the paper presents a blockchain-based solution to maintain the supply chain of medical devices. The solution provides a distributed environment that can track various medical treatments from production to use. The finished product is stored in the blockchain through its digital thread. Required details are added from time to time which records the entire virtual life-cycle of the medical device forming the digital thread. This digital thread adds traceability to the existing supply chain. Keeping track of devices also helps in returning the expired devices to the manufacturer for its recycling. This blockchain-based solution is mainly composed of two phases. Blockchain-based solution design, this involves the design of the blockchain network architecture, which constitutes the required smart contract. This phase is implemented using the secure network of Hyperledger Fabric (HLF). The next phase includes the deployment of the generated network over the Kubernetes to make the system scalable and more available. To demonstrate and evaluate the performance matrix, a prototype solution of the designed platform is implemented and deployed on the Kubernetes. Finally, this research concludes with the benefits and shortcomings of the solution with future scope to make this platform perform better in all aspects.
Date Created
2021
Agent