A Data-driven, High-performance and Intelligent CyberInfrastructure to Advance Spatial Sciences

157004-Thumbnail Image.png
Description
In the field of Geographic Information Science (GIScience), we have witnessed the unprecedented data deluge brought about by the rapid advancement of high-resolution data observing technologies. For example, with the advancement of Earth Observation (EO) technologies, a massive amount of

In the field of Geographic Information Science (GIScience), we have witnessed the unprecedented data deluge brought about by the rapid advancement of high-resolution data observing technologies. For example, with the advancement of Earth Observation (EO) technologies, a massive amount of EO data including remote sensing data and other sensor observation data about earthquake, climate, ocean, hydrology, volcano, glacier, etc., are being collected on a daily basis by a wide range of organizations. In addition to the observation data, human-generated data including microblogs, photos, consumption records, evaluations, unstructured webpages and other Volunteered Geographical Information (VGI) are incessantly generated and shared on the Internet.

Meanwhile, the emerging cyberinfrastructure rapidly increases our capacity for handling such massive data with regard to data collection and management, data integration and interoperability, data transmission and visualization, high-performance computing, etc. Cyberinfrastructure (CI) consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high-performance networks to improve research productivity and enable breakthroughs that are not otherwise possible.

The Geospatial CI (GCI, or CyberGIS), as the synthesis of CI and GIScience has inherent advantages in enabling computationally intensive spatial analysis and modeling (SAM) and collaborative geospatial problem solving and decision making.

This dissertation is dedicated to addressing several critical issues and improving the performance of existing methodologies and systems in the field of CyberGIS. My dissertation will include three parts: The first part is focused on developing methodologies to help public researchers find appropriate open geo-spatial datasets from millions of records provided by thousands of organizations scattered around the world efficiently and effectively. Machine learning and semantic search methods will be utilized in this research. The second part develops an interoperable and replicable geoprocessing service by synthesizing the high-performance computing (HPC) environment, the core spatial statistic/analysis algorithms from the widely adopted open source python package – Python Spatial Analysis Library (PySAL), and rich datasets acquired from the first research. The third part is dedicated to studying optimization strategies for feature data transmission and visualization. This study is intended for solving the performance issue in large feature data transmission through the Internet and visualization on the client (browser) side.

Taken together, the three parts constitute an endeavor towards the methodological improvement and implementation practice of the data-driven, high-performance and intelligent CI to advance spatial sciences.
Date Created
2018
Agent

The Perception of Graph Properties In Graph Layouts

156643-Thumbnail Image.png
Description
When looking at drawings of graphs, questions about graph density, community structures, local clustering and other graph properties may be of critical importance for analysis. While graph layout algorithms have focused on minimizing edge crossing, symmetry, and other such layout

When looking at drawings of graphs, questions about graph density, community structures, local clustering and other graph properties may be of critical importance for analysis. While graph layout algorithms have focused on minimizing edge crossing, symmetry, and other such layout properties, there is not much known about how these algorithms relate to a user’s ability to perceive graph properties for a given graph layout. This study applies previously established methodologies for perceptual analysis to identify which graph drawing layout will help the user best perceive a particular graph property. A large scale (n = 588) crowdsourced experiment is conducted to investigate whether the perception of two graph properties (graph density and average local clustering coefficient) can be modeled using Weber’s law. Three graph layout algorithms from three representative classes (Force Directed - FD, Circular, and Multi-Dimensional Scaling - MDS) are studied, and the results of this experiment establish the precision of judgment for these graph layouts and properties. The findings demonstrate that the perception of graph density can be modeled with Weber’s law. Furthermore, the perception of the average clustering coefficient can be modeled as an inverse of Weber’s law, and the MDS layout showed a significantly different precision of judgment than the FD layout.
Date Created
2018
Agent

A Framework for Spatial Database Explanations

156624-Thumbnail Image.png
Description
In the last few years, there has been a tremendous increase in the use of big data. Most of this data is hard to understand because of its size and dimensions. The importance of this problem can be emphasized by

In the last few years, there has been a tremendous increase in the use of big data. Most of this data is hard to understand because of its size and dimensions. The importance of this problem can be emphasized by the fact that Big Data Research and Development Initiative was announced by the United States administration in 2012 to address problems faced by the government. Various states and cities in the US gather spatial data about incidents like police calls for service.

When we query large amounts of data, it may lead to a lot of questions. For example, when we look at arithmetic relationships between queries in heterogeneous data, there are a lot of differences. How can we explain what factors account for these differences? If we define the observation as an arithmetic relationship between queries, this kind of problem can be solved by aggravation or intervention. Aggravation views the value of our observation for different set of tuples while intervention looks at the value of the observation after removing sets of tuples. We call the predicates which represent these tuples, explanations. Observations by themselves have limited importance. For example, if we observe a large number of taxi trips in a specific area, we might ask the question: Why are there so many trips here? Explanations attempt to answer these kinds of questions.

While aggravation and intervention are designed for non spatial data, we propose a new approach for explaining spatially heterogeneous data. Our approach expands on aggravation and intervention while using spatial partitioning/clustering to improve explanations for spatial data. Our proposed approach was evaluated against a real-world taxi dataset as well as a synthetic disease outbreak datasets. The approach was found to outperform aggravation in precision and recall while outperforming intervention in precision.
Date Created
2018
Agent

A Spatial Decision Support System for Oil Spill Response and Recovery

156595-Thumbnail Image.png
Description
Coastal areas are susceptible to man-made disasters, such as oil spills, which not

only have a dreadful impact on the lives of coastal communities and businesses but also

have lasting and hazardous consequences. The United States coastal areas, especially

the Gulf of Mexico,

Coastal areas are susceptible to man-made disasters, such as oil spills, which not

only have a dreadful impact on the lives of coastal communities and businesses but also

have lasting and hazardous consequences. The United States coastal areas, especially

the Gulf of Mexico, have witnessed devastating oil spills of varied sizes and durations

that resulted in major economic and ecological losses. These disasters affected the oil,

housing, forestry, tourism, and fishing industries with overall costs exceeding billions

of dollars (Baade et al. (2007); Smith et al. (2011)). Extensive research has been

done with respect to oil spill simulation techniques, spatial optimization models, and

innovative strategies to deal with spill response and planning efforts. However, most

of the research done in those areas is done independently of each other, leaving a

conceptual void between them.

In the following work, this thesis presents a Spatial Decision Support System

(SDSS), which efficiently integrates the independent facets of spill modeling techniques

and spatial optimization to enable officials to investigate and explore the various

options to clean up an offshore oil spill to make a more informed decision. This

thesis utilizes Blowout and Spill Occurrence Model (BLOSOM) developed by Sim

et al. (2015) to simulate hypothetical oil spill scenarios, followed by the Oil Spill

Cleanup and Operational Model (OSCOM) developed by Grubesic et al. (2017) to

spatially optimize the response efforts. The results of this combination are visualized

in the SDSS, featuring geographical maps, so the boat ramps from which the response

should be launched can be easily identified along with the amount of oil that hits the

shore thereby visualizing the intensity of the impact of the spill in the coastal areas

for various cleanup targets.
Date Created
2018
Agent

Detecting Frames and Causal Relationships in Climate Change Related Text Databases Based on Semantic Features

156205-Thumbnail Image.png
Description
The subliminal impact of framing of social, political and environmental issues such as climate change has been studied for decades in political science and communications research. Media framing offers an “interpretative package" for average citizens on how to make sense

The subliminal impact of framing of social, political and environmental issues such as climate change has been studied for decades in political science and communications research. Media framing offers an “interpretative package" for average citizens on how to make sense of climate change and its consequences to their livelihoods, how to deal with its negative impacts, and which mitigation or adaptation policies to support. A line of related work has used bag of words and word-level features to detect frames automatically in text. Such works face limitations since standard keyword based features may not generalize well to accommodate surface variations in text when different keywords are used for similar concepts.

This thesis develops a unique type of textual features that generalize triplets extracted from text, by clustering them into high-level concepts. These concepts are utilized as features to detect frames in text. Compared to uni-gram and bi-gram based models, classification and clustering using generalized concepts yield better discriminating features and a higher classification accuracy with a 12% boost (i.e. from 74% to 83% F-measure) and 0.91 clustering purity for Frame/Non-Frame detection.

The automatic discovery of complex causal chains among interlinked events and their participating actors has not yet been thoroughly studied. Previous studies related to extracting causal relationships from text were based on laborious and incomplete hand-developed lists of explicit causal verbs, such as “causes" and “results in." Such approaches result in limited recall because standard causal verbs may not generalize well to accommodate surface variations in texts when different keywords and phrases are used to express similar causal effects. Therefore, I present a system that utilizes generalized concepts to extract causal relationships. The proposed algorithms overcome surface variations in written expressions of causal relationships and discover the domino effects between climate events and human security. This semi-supervised approach alleviates the need for labor intensive keyword list development and annotated datasets. Experimental evaluations by domain experts achieve an average precision of 82%. Qualitative assessments of causal chains show that results are consistent with the 2014 IPCC report illuminating causal mechanisms underlying the linkages between climatic stresses and social instability.
Date Created
2018
Agent

Mining Marked Nodes in Large Graphs

156193-Thumbnail Image.png
Description
With the rise of the Big Data Era, an exponential amount of network data is being generated at an unprecedented rate across a wide-range of high impact micro and macro areas of research---from protein interaction to social networks. The critical

With the rise of the Big Data Era, an exponential amount of network data is being generated at an unprecedented rate across a wide-range of high impact micro and macro areas of research---from protein interaction to social networks. The critical challenge is translating this large scale network data into actionable information.

A key task in the data translation is the analysis of network connectivity via marked nodes---the primary focus of our research. We have developed a framework for analyzing network connectivity via marked nodes in large scale graphs, utilizing novel algorithms in three interrelated areas: (1) analysis of a single seed node via it’s ego-centric network (AttriPart algorithm); (2) pathway identification between two seed nodes (K-Simple Shortest Paths Multithreaded and Search Reduced (KSSPR) algorithm); and (3) tree detection, defining the interaction between three or more seed nodes (Shortest Path MST algorithm).

In an effort to address both fundamental and applied research issues, we have developed the LocalForcasting algorithm to explore how network connectivity analysis can be applied to local community evolution and recommender systems. The goal is to apply the LocalForecasting algorithm to various domains---e.g., friend suggestions in social networks or future collaboration in co-authorship networks. This algorithm utilizes link prediction in combination with the AttriPart algorithm to predict future connections in local graph partitions.

Results show that our proposed AttriPart algorithm finds up to 1.6x denser local partitions, while running approximately 43x faster than traditional local partitioning techniques (PageRank-Nibble). In addition, our LocalForecasting algorithm demonstrates a significant improvement in the number of nodes and edges correctly predicted over baseline methods. Furthermore, results for the KSSPR algorithm demonstrate a speed-up of up to 2.5x the standard k-simple shortest paths algorithm.
Date Created
2018
Agent

Stakeholder Analysis for the Food-Energy-Water Nexus in Phoenix, Arizona: Implications for Nexus Governance

127822-Thumbnail Image.png
Description

Understanding the food-energy-water nexus is necessary to identify risks and inform strategies for nexus governance to support resilient, secure, and sustainable societies. To manage risks and realize efficiencies, we must understand not only how these systems are physically connected but

Understanding the food-energy-water nexus is necessary to identify risks and inform strategies for nexus governance to support resilient, secure, and sustainable societies. To manage risks and realize efficiencies, we must understand not only how these systems are physically connected but also how they are institutionally linked. It is important to understand how actors who make planning, management, and policy decisions understand the relationships among components of the systems. Our question is: How do stakeholders involved in food, energy, and water governance in Phoenix, Arizona understand the nexus and what are the implications for integrated nexus governance? We employ a case study design, generate qualitative data through focus groups and interviews, and conduct a content analysis. While stakeholders in the Phoenix area who are actively engaged in food, energy, and water systems governance appreciate the rationale for nexus thinking, they recognize practical limitations to implementing these concepts. Concept maps of nexus interactions provide one view of system interconnections that be used to complement other ways of knowing the nexus, such as physical infrastructure system diagrams or actor-networks. Stakeholders believe nexus governance could be improved through awareness and education, consensus and collaboration, transparency, economic incentives, working across scales, and incremental reforms.

Date Created
2017-11-29
Agent

Visual Event Cueing in Linked Spatiotemporal Data

155977-Thumbnail Image.png
Description
The media disperses a large amount of information daily pertaining to political events social movements, and societal conflicts. Media pertaining to these topics, no matter the format of publication used, are framed a particular way. Framing is used not for

The media disperses a large amount of information daily pertaining to political events social movements, and societal conflicts. Media pertaining to these topics, no matter the format of publication used, are framed a particular way. Framing is used not for just guiding audiences to desired beliefs, but also to fuel societal change or legitimize/delegitimize social movements. For this reason, tools that can help to clarify when changes in social discourse occur and identify their causes are of great use. This thesis presents a visual analytics framework that allows for the exploration and visualization of changes that occur in social climate with respect to space and time. Focusing on the links between data from the Armed Conflict Location and Event Data Project (ACLED) and a streaming RSS news data set, users can be cued into interesting events enabling them to form and explore hypothesis. This visual analytics framework also focuses on improving intervention detection, allowing users to hypothesize about correlations between events and happiness levels, and supports collaborative analysis.
Date Created
2017
Agent

Evaluation of Storage Systems for Big Data Analytics

155951-Thumbnail Image.png
Description
Recent trends in big data storage systems show a shift from disk centric models to memory centric models. The primary challenges faced by these systems are speed, scalability, and fault tolerance. It is interesting to investigate the performance of these

Recent trends in big data storage systems show a shift from disk centric models to memory centric models. The primary challenges faced by these systems are speed, scalability, and fault tolerance. It is interesting to investigate the performance of these two models with respect to some big data applications. This thesis studies the performance of Ceph (a disk centric model) and Alluxio (a memory centric model) and evaluates whether a hybrid model provides any performance benefits with respect to big data applications. To this end, an application TechTalk is created that uses Ceph to store data and Alluxio to perform data analytics. The functionalities of the application include offline lecture storage, live recording of classes, content analysis and reference generation. The knowledge base of videos is constructed by analyzing the offline data using machine learning techniques. This training dataset provides knowledge to construct the index of an online stream. The indexed metadata enables the students to search, view and access the relevant content. The performance of the application is benchmarked in different use cases to demonstrate the benefits of the hybrid model.
Date Created
2017
Agent

A Visual Analytics Process for Exploring Risk and Vulnerability in International Food Trade Networks

155940-Thumbnail Image.png
Description
The rise in globalization has led to regional climate events having an increased effect on global food security. These indirect first- and second-order effects are generally geographically disparate from the region experiencing the climate event. Without understanding the topology of

The rise in globalization has led to regional climate events having an increased effect on global food security. These indirect first- and second-order effects are generally geographically disparate from the region experiencing the climate event. Without understanding the topology of the food trade network, international aid may be naively directed to the countries directly experiencing the climate event and not to countries that will face potential food insecurity due to that event. This thesis focuses on the development of a visual analytics system for exploring second-order effects of climate change under the lens of global trade. In order to visualize how climate change impacts the world trade network of agricultural goods I have developed an interactive data visualization platform for analysis of the interaction between climate events and the trade network. The proposed visual analytics system focuses on visualizing current trade dependencies at a more granular level than the currently available tools and to aid in the identification of future vulnerabilities. To demonstrate the applicability of the tool, two case studies are described. The first case study focuses on the Chinese drought of 2011 and its impact on the global trade network and food security. The second case study will model the potential impact of a climate event affecting production in the United States, a large supplier of corn, to demonstrate the potential consequence of cascading effects in the global trade network.
Date Created
2017
Agent