Multi-Variant Spatially Informed Rapid Testing for Epidemic Model

171878-Thumbnail Image.png
Description
The COVID-19 outbreak that started in 2020, brought the world to its knees and is still a menace after three years. Over eighty-five million cases and over a million deaths have occurred due to COVID-19 during that time in the

The COVID-19 outbreak that started in 2020, brought the world to its knees and is still a menace after three years. Over eighty-five million cases and over a million deaths have occurred due to COVID-19 during that time in the United States alone. A great deal of research has gone into making epidemic models to show the impact of the virus by plotting the cases, deaths, and hospitalization due to COVID-19. However, there is very less research that has anything to do with mapping different variants of COVID-19. SARS-CoV-2, the virus that causes COVID-19, constantly mutates and multiple variants have emerged over time. The major variants include Beta, Gamma, Delta and the recent one, Omicron. The purpose of the research done in this thesis is to modify one of the epidemic models i.e., the Spatially Informed Rapid Testing for Epidemic Model (SIRTEM), in such a way that various variants of the virus will be modelled at the same time. The model will be assessed by adding the Omicron and the Delta variants and in doing so, the effects of different variants can be studied by looking at the positive cases, hospitalizations, and deaths from both the variants for the Arizona Population. The focus will be to find the best infection rate and testing rate by using Random numbers so that the published positive cases and the positive cases derived from the model have the least mean square error.
Date Created
2022
Agent

On Processing Spatial Queries in Graph Database Management Systems

161302-Thumbnail Image.png
Description
Spatial data is fundamental in many applications like map services, land resource management, etc. Meanwhile, spatial data inherently comes with abundant context information because spatial entities themselves possess different properties, e.g., graph or textual information, etc. Among all these compound

Spatial data is fundamental in many applications like map services, land resource management, etc. Meanwhile, spatial data inherently comes with abundant context information because spatial entities themselves possess different properties, e.g., graph or textual information, etc. Among all these compound spatial data, geospatial graph data is one of the most challenging for the complexity of graph data. Graph data is commonly used to model real scenarios and searching for the matching subgraphs is fundamental in retrieving and analyzing graph data. With the ubiquity of spatial data, vertexes or edges in graphs are enriched with spatial location attributes side by side with other non-spatial attributes. Graph-based applications integrate spatial data into the graph model and provide more spatial-aware services. The co-existence of the graph and spatial data in the same geospatial graph triggers some new applications. To solve new problems in these applications, existing solutions develop an integrated system that incorporates the graph database and spatial database engines. However, existing approaches suffer from the architecture where graph data and spatial data are isolated. In this dissertation, I will explain two indexing frameworks, GeoReach and RisoTree, which can significantly accelerate the queries in geospatial graphs. GeoReach includes a query operator that adds spatial data awareness to a graph database management system. In GeoReach, the neighborhood spatial information is summarized and stored on each vertex in the graph. The summarization includes three different structures according to the location distribution. These spatial summaries are utilized to terminate the graph search early.RisoTree is a hierarchical tree structure where each node is represented by a minimum bounding rectangle (MBR). The MBR of a node is a rectangle that encloses all its children. A key difference between RisoTree and RTree is that RisoTree contains pre-materialized subgraph information to each index node. The subgraph information is utilized during the spatial index search phase to prune search paths that cannot satisfy the query graph pattern. The RisoTree index reduces the search space when the spatial filtering phase is performed with relatively light cost.
Date Created
2021
Agent

On Density and Noise Challenges in Tensor-Based Data Analytics

161232-Thumbnail Image.png
Description
Many real-world problems, such as model- and data-driven computer simulation analysis, social and collaborative network analysis, brain data analysis, and so on, benefit from jointly modeling and analyzing the underlying patterns associated with complex, multi-relational data. Tensor decomposition is an

Many real-world problems, such as model- and data-driven computer simulation analysis, social and collaborative network analysis, brain data analysis, and so on, benefit from jointly modeling and analyzing the underlying patterns associated with complex, multi-relational data. Tensor decomposition is an ideal mathematical tool for this joint modeling, due to its simultaneous analysis of such multi-relational data, which is made possible by the data's multidimensional, array-based nature. A major challenge in tensor decomposition lies with its computational and space complexity, especially for dense datasets. While the process is comparatively faster for sparse tensors, decomposition is still a major bottleneck for many applications. The tensor decomposition process results in dense (hence, large) intermediate results, even when the input tensor is sparse (or small). Noise is another challenge for most data mining techniques, and many tensor decomposition schemes are sensitive to noisy datasets; this is an inevitable problem for real-world data, which can lead to false conclusions. In this dissertation, I develop innovative tensor decomposition algorithms for mining both sparse and dense multi-relational data in a noise-resistant way. I present novel, scalable, parallelizable tensor decomposition algorithms, specifically tuned to be effective for dense, noisy tensors, and which maintain the quality of the resulting analysis. Furthermore, I present results on multi-relational data applications focusing on model- and data-driven computer simulation analysis, as well as social network and web mining, which demonstrate the effectiveness of these tensor decompositions.
Date Created
2019
Agent

MedFabric4Me: Blockchain Based Patient Centric Electronic Health Records System

158361-Thumbnail Image.png
Description
Blockchain technology enables a distributed and decentralized environment without any central authority. Healthcare is one industry in which blockchain is expected to have significant impacts. In recent years, the Healthcare Information Exchange(HIE) has been shown to benefit the healthcare industry

Blockchain technology enables a distributed and decentralized environment without any central authority. Healthcare is one industry in which blockchain is expected to have significant impacts. In recent years, the Healthcare Information Exchange(HIE) has been shown to benefit the healthcare industry remarkably. It has been shown that blockchain could help to improve multiple aspects of the HIE system.

When Blockchain technology meets HIE, there are only a few proposed systems and they all suffer from the following two problems. First, the existing systems are not patient-centric in terms of data governance. Patients do not own their data and have no direct control over it. Second, there is no defined protocol among different systems on how to share sensitive data.

To address the issues mentioned above, this paper proposes MedFabric4Me, a blockchain-based platform for HIE. MedFabric4Me is a patient-centric system where patients own their healthcare data and share on a need-to-know basis. First, analyzed the requirements for a patient-centric system which ensures tamper-proof sharing of data among participants. Based on the analysis, a Merkle root based mechanism is created to ensure that data has not tampered. Second, a distributed Proxy re-encryption system is used for secure encryption of data during storage and sharing of records. Third, combining off-chain storage and on-chain access management for both authenticability and privacy.

MedFabric4Me is a two-pronged solution platform, composed of on-chain and off-chain components. The on-chain solution is implemented on the secure network of Hyperledger Fabric(HLF) while the off-chain solution uses Interplanetary File System(IPFS) to store data securely. Ethereum based Nucypher, a proxy re-encryption network provides cryptographic access controls to actors for encrypted data sharing.

To demonstrate the practicality and scalability, a prototype solution of MedFabric4Me is implemented and evaluated the performance measure of the system against an already implemented HIE.

Results show that decentralization technology like blockchain could help to mitigate some issues that HIE faces today, like transparency for patients, slow emergency response, and better access control.

Finally, this research concluded with the benefits and shortcomings of MedFabric4Me with some directions and work that could benefit MedFabric4Me in terms of operation and performance.
Date Created
2020
Agent

Enabling Peer to Peer Energy Trading Marketplace Using Consortium Blockchain Networks

157869-Thumbnail Image.png
Description
Blockchain technology enables peer-to-peer transactions through the elimination of the need for a centralized entity governing consensus. Rather than having a centralized database, the data is distributed across multiple computers which enables crash fault tolerance as well as makes the

Blockchain technology enables peer-to-peer transactions through the elimination of the need for a centralized entity governing consensus. Rather than having a centralized database, the data is distributed across multiple computers which enables crash fault tolerance as well as makes the system difficult to tamper with due to a distributed consensus algorithm.

In this research, the potential of blockchain technology to manage energy transactions is examined. The energy production landscape is being reshaped by distributed energy resources (DERs): photo-voltaic panels, electric vehicles, smart appliances, and battery storage. Distributed energy sources such as microgrids, household solar installations, community solar installations, and plug-in hybrid vehicles enable energy consumers to act as providers of energy themselves, hence acting as 'prosumers' of energy.

Blockchain Technology facilitates managing the transactions between involved prosumers using 'Smart Contracts' by tokenizing energy into assets. Better utilization of grid assets lowers costs and also presents the opportunity to buy energy at a reasonable price while staying connected with the utility company. This technology acts as a backbone for 2 models applicable to transactional energy marketplace viz. 'Real-Time Energy Marketplace' and 'Energy Futures'. In the first model, the prosumers are given a choice to bid for a price for energy within a stipulated period of time, while the Utility Company acts as an operating entity. In the second model, the marketplace is more liberal, where the utility company is not involved as an operator. The Utility company facilitates infrastructure and manages accounts for all users, but does not endorse or govern transactions related to energy bidding. These smart contracts are not time bounded and can be suspended by the utility during periods of network instability.
Date Created
2019
Agent

Digital Fountain for Multi-node Aggregation of Data in Blockchains

156945-Thumbnail Image.png
Description
Blockchain scalability is one of the issues that concerns its current adopters. The current popular blockchains have initially been designed with imperfections that in- troduce fundamental bottlenecks which limit their ability to have a higher throughput and a lower latency.

One

Blockchain scalability is one of the issues that concerns its current adopters. The current popular blockchains have initially been designed with imperfections that in- troduce fundamental bottlenecks which limit their ability to have a higher throughput and a lower latency.

One of the major bottlenecks for existing blockchain technologies is fast block propagation. A faster block propagation enables a miner to reach a majority of the network within a time constraint and therefore leading to a lower orphan rate and better profitability. In order to attain a throughput that could compete with the current state of the art transaction processing, while also keeping the block intervals same as today, a 24.3 Gigabyte block will be required every 10 minutes with an average transaction size of 500 bytes, which translates to 48600000 transactions every 10 minutes or about 81000 transactions per second.

In order to synchronize such large blocks faster across the network while maintain- ing consensus by keeping the orphan rate below 50%, the thesis proposes to aggregate partial block data from multiple nodes using digital fountain codes. The advantages of using a fountain code is that all connected peers can send part of data in an encoded form. When the receiving peer has enough data, it then decodes the information to reconstruct the block. Along with them sending only part information, the data can be relayed over UDP, instead of TCP, improving upon the speed of propagation in the current blockchains. Fountain codes applied in this research are Raptor codes, which allow construction of infinite decoding symbols. The research, when applied to blockchains, increases success rate of block delivery on decode failures.
Date Created
2018
Agent

Query Workload-Aware Index Structures for Range Searches in 1D, 2D, and High-Dimensional Spaces

155846-Thumbnail Image.png
Description
Most current database management systems are optimized for single query execution.

Yet, often, queries come as part of a query workload. Therefore, there is a need

for index structures that can take into consideration existence of multiple queries in a

query workload and

Most current database management systems are optimized for single query execution.

Yet, often, queries come as part of a query workload. Therefore, there is a need

for index structures that can take into consideration existence of multiple queries in a

query workload and efficiently produce accurate results for the entire query workload.

These index structures should be scalable to handle large amounts of data as well as

large query workloads.

The main objective of this dissertation is to create and design scalable index structures

that are optimized for range query workloads. Range queries are an important

type of queries with wide-ranging applications. There are no existing index structures

that are optimized for efficient execution of range query workloads. There are

also unique challenges that need to be addressed for range queries in 1D, 2D, and

high-dimensional spaces. In this work, I introduce novel cost models, index selection

algorithms, and storage mechanisms that can tackle these challenges and efficiently

process a given range query workload in 1D, 2D, and high-dimensional spaces. In particular,

I introduce the index structures, HCS (for 1D spaces), cSHB (for 2D spaces),

and PSLSH (for high-dimensional spaces) that are designed specifically to efficiently

handle range query workload and the unique challenges arising from their respective

spaces. I experimentally show the effectiveness of the above proposed index structures

by comparing with state-of-the-art techniques.
Date Created
2017
Agent

Client-driven dynamic database updates

150212-Thumbnail Image.png
Description
This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management systems (DBMS). Also, unlike other client-driven work, this approach provides support for a richer set of schema updates including vertical split (normalization), horizontal split, vertical and horizontal merge (union), difference and intersection. The update process automatically generates a runtime update client from a mapping between the old the new schemas. The solution has been validated by testing it on a relatively small database of around 300,000 records per table and less than 1 Gb, but with limited memory buffer size of 24 Mb. This thesis presents the study of the overhead of the update process as a function of the transaction rates and the batch size used to copy data from the old to the new schema. It shows that the overhead introduced is minimal for medium size applications and that the update can be achieved with no more than one minute of downtime.
Date Created
2011
Agent

An information diffusion approach to detecting emotional contagion in online social networks

150174-Thumbnail Image.png
Description
Internet sites that support user-generated content, so-called Web 2.0, have become part of the fabric of everyday life in technologically advanced nations. Users collectively spend billions of hours consuming and creating content on social networking sites, weblogs (blogs), and various

Internet sites that support user-generated content, so-called Web 2.0, have become part of the fabric of everyday life in technologically advanced nations. Users collectively spend billions of hours consuming and creating content on social networking sites, weblogs (blogs), and various other types of sites in the United States and around the world. Given the fundamentally emotional nature of humans and the amount of emotional content that appears in Web 2.0 content, it is important to understand how such websites can affect the emotions of users. This work attempts to determine whether emotion spreads through an online social network (OSN). To this end, a method is devised that employs a model based on a general threshold diffusion model as a classifier to predict the propagation of emotion between users and their friends in an OSN by way of mood-labeled blog entries. The model generalizes existing information diffusion models in that the state machine representation of a node is generalized from being binary to having n-states in order to support n class labels necessary to model emotional contagion. In the absence of ground truth, the prediction accuracy of the model is benchmarked with a baseline method that predicts the majority label of a user's emotion label distribution. The model significantly outperforms the baseline method in terms of prediction accuracy. The experimental results make a strong case for the existence of emotional contagion in OSNs in spite of possible alternative arguments such confounding influence and homophily, since these alternatives are likely to have negligible effect in a large dataset or simply do not apply to the domain of human emotions. A hybrid manual/automated method to map mood-labeled blog entries to a set of emotion labels is also presented, which enables the application of the model to a large set (approximately 900K) of blog entries from LiveJournal.
Date Created
2011
Agent

Enhancing the usability of complex structured data by supporting keyword searches

150026-Thumbnail Image.png
Description
As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.
Date Created
2011
Agent