Spatializing partisan gerrymandering forensics: local measures and spatial specifications

155931-Thumbnail Image.png
Description
Gerrymandering is a central problem for many representative democracies. Formally, gerrymandering is the manipulation of spatial boundaries to provide political advantage to a particular group (Warf, 2006). The term often refers to political district design, where the boundaries of political

Gerrymandering is a central problem for many representative democracies. Formally, gerrymandering is the manipulation of spatial boundaries to provide political advantage to a particular group (Warf, 2006). The term often refers to political district design, where the boundaries of political districts are “unnaturally” manipulated by redistricting officials to generate durable advantages for one group or party. Since free and fair elections are possibly the critical part of representative democracy, it is important for this cresting tide to have scientifically validated tools. This dissertation supports a current wave of reform by developing a general inferential technique to “localize” inferential bias measures, generating a new type of district-level score. The new method relies on the statistical intuition behind jackknife methods to construct relative local indicators. I find that existing statewide indicators of partisan bias can be localized using this technique, providing an estimate of how strongly a district impacts statewide partisan bias over an entire decade. When compared to measures of shape compactness (a common gerrymandering detection statistic), I find that weirdly-shaped districts have no consistent relationship with impact in many states during the 2000 and 2010 redistricting plan. To ensure that this work is valid, I examine existing seats-votes modeling strategies and develop a novel method for constructing seats-votes curves. I find that, while the empirical structure of electoral swing shows significant spatial dependence (even in the face of spatial heterogeneity), existing seats-votes specifications are more robust than anticipated to spatial dependence. Centrally, this dissertation contributes to the much larger social aim to resist electoral manipulation: that individuals & organizations suffer no undue burden on political access from partisan gerrymandering.
Date Created
2017
Agent

Policy and Place: A Spatial Data Science Framework for Research and Decision-Making

155841-Thumbnail Image.png
Description
A major challenge in health-related policy and program evaluation research is attributing underlying causal relationships where complicated processes may exist in natural or quasi-experimental settings. Spatial interaction and heterogeneity between units at individual or group levels can violate both components

A major challenge in health-related policy and program evaluation research is attributing underlying causal relationships where complicated processes may exist in natural or quasi-experimental settings. Spatial interaction and heterogeneity between units at individual or group levels can violate both components of the Stable-Unit-Treatment-Value-Assumption (SUTVA) that are core to the counterfactual framework, making treatment effects difficult to assess. New approaches are needed in health studies to develop spatially dynamic causal modeling methods to both derive insights from data that are sensitive to spatial differences and dependencies, and also be able to rely on a more robust, dynamic technical infrastructure needed for decision-making. To address this gap with a focus on causal applications theoretically, methodologically and technologically, I (1) develop a theoretical spatial framework (within single-level panel econometric methodology) that extends existing theories and methods of causal inference, which tend to ignore spatial dynamics; (2) demonstrate how this spatial framework can be applied in empirical research; and (3) implement a new spatial infrastructure framework that integrates and manages the required data for health systems evaluation.

The new spatially explicit counterfactual framework considers how spatial effects impact treatment choice, treatment variation, and treatment effects. To illustrate this new methodological framework, I first replicate a classic quasi-experimental study that evaluates the effect of drinking age policy on mortality in the United States from 1970 to 1984, and further extend it with a spatial perspective. In another example, I evaluate food access dynamics in Chicago from 2007 to 2014 by implementing advanced spatial analytics that better account for the complex patterns of food access, and quasi-experimental research design to distill the impact of the Great Recession on the foodscape. Inference interpretation is sensitive to both research design framing and underlying processes that drive geographically distributed relationships. Finally, I advance a new Spatial Data Science Infrastructure to integrate and manage data in dynamic, open environments for public health systems research and decision- making. I demonstrate an infrastructure prototype in a final case study, developed in collaboration with health department officials and community organizations.
Date Created
2017
Agent

A Geospatial Cyberinfrastructure for Urban Economic Analysis and Spatial Decision-Making

130331-Thumbnail Image.png
Description
Urban economic modeling and effective spatial planning are critical tools towards achieving urban sustainability. However, in practice, many technical obstacles, such as information islands, poor documentation of data and lack of software platforms to facilitate virtual collaboration, are challenging the

Urban economic modeling and effective spatial planning are critical tools towards achieving urban sustainability. However, in practice, many technical obstacles, such as information islands, poor documentation of data and lack of software platforms to facilitate virtual collaboration, are challenging the effectiveness of decision-making processes. In this paper, we report on our efforts to design and develop a geospatial cyberinfrastructure (GCI) for urban economic analysis and simulation. This GCI provides an operational graphic user interface, built upon a service-oriented architecture to allow (1) widespread sharing and seamless integration of distributed geospatial data; (2) an effective way to address the uncertainty and positional errors encountered in fusing data from diverse sources; (3) the decomposition of complex planning questions into atomic spatial analysis tasks and the generation of a web service chain to tackle such complex problems; and (4) capturing and representing provenance of geospatial data to trace its flow in the modeling task. The Greater Los Angeles Region serves as the test bed. We expect this work to contribute to effective spatial policy analysis and decision-making through the adoption of advanced GCI and to broaden the application coverage of GCI to include urban economic simulations.
Date Created
2013-05-21
Agent

A taxonomy of parallel vector spatial analysis algorithms

154079-Thumbnail Image.png
Description
Nearly 25 years ago, parallel computing techniques were first applied to vector spatial analysis methods. This initial research was driven by the desire to reduce computing times in order to support scaling to larger problem sets. Since this initial work,

Nearly 25 years ago, parallel computing techniques were first applied to vector spatial analysis methods. This initial research was driven by the desire to reduce computing times in order to support scaling to larger problem sets. Since this initial work, rapid technological advancement has driven the availability of High Performance Computing (HPC) resources, in the form of multi-core desktop computers, distributed geographic information processing systems, e.g. computational grids, and single site HPC clusters. In step with increases in computational resources, significant advancement in the capabilities to capture and store large quantities of spatially enabled data have been realized. A key component to utilizing vast data quantities in HPC environments, scalable algorithms, have failed to keep pace. The National Science Foundation has identified the lack of scalable algorithms in codified frameworks as an essential research product. Fulfillment of this goal is challenging given the lack of a codified theoretical framework mapping atomic numeric operations from the spatial analysis stack to parallel programming paradigms, the diversity in vernacular utilized by research groups, the propensity for implementations to tightly couple to under- lying hardware, and the general difficulty in realizing scalable parallel algorithms. This dissertation develops a taxonomy of parallel vector spatial analysis algorithms with classification being defined by root mathematical operation and communication pattern, a computational dwarf. Six computational dwarfs are identified, three being drawn directly from an existing parallel computing taxonomy and three being created to capture characteristics unique to spatial analysis algorithms. The taxonomy provides a high-level classification decoupled from low-level implementation details such as hardware, communication protocols, implementation language, decomposition method, or file input and output. By taking a high-level approach implementation specifics are broadly proposed, breadth of coverage is achieved, and extensibility is ensured. The taxonomy is both informed and informed by five case studies im- plemented across multiple, divergent hardware environments. A major contribution of this dissertation is a theoretical framework to support the future development of concrete parallel vector spatial analysis frameworks through the identification of computational dwarfs and, by extension, successful implementation strategies.
Date Created
2015
Agent

Open Geospatial Analytics with PySAL

130375-Thumbnail Image.png
Description
This article reviews the range of delivery platforms that have been developed for the PySAL open source Python library for spatial analysis. This includes traditional desktop software (with a graphical user interface, command line or embedded in a computational notebook),

This article reviews the range of delivery platforms that have been developed for the PySAL open source Python library for spatial analysis. This includes traditional desktop software (with a graphical user interface, command line or embedded in a computational notebook), open spatial analytics middleware, and web, cloud and distributed open geospatial analytics for decision support. A common thread throughout the discussion is the emphasis on openness, interoperability, and provenance management in a scientific workflow. The code base of the PySAL library provides the common computing framework underlying all delivery mechanisms.
Date Created
2015-06-01
Agent

Tile-based methods for online choropleth mapping: a scalability evaluation

152171-Thumbnail Image.png
Description

Choropleth maps are a common form of online cartographic visualization. They reveal patterns in spatial distributions of a variable by associating colors with data values measured at areal units. Although this capability of pattern revelation has popularized the use of

Choropleth maps are a common form of online cartographic visualization. They reveal patterns in spatial distributions of a variable by associating colors with data values measured at areal units. Although this capability of pattern revelation has popularized the use of choropleth maps, existing methods for their online delivery are limited in supporting dynamic map generation from large areal data. This limitation has become increasingly problematic in online choropleth mapping as access to small area statistics, such as high-resolution census data and real-time aggregates of geospatial data streams, has never been easier due to advances in geospatial web technologies. The current literature shows that the challenge of large areal data can be mitigated through tiled maps where pre-processed map data are hierarchically partitioned into tiny rectangular images or map chunks for efficient data transmission. Various approaches have emerged lately to enable this tile-based choropleth mapping, yet little empirical evidence exists on their ability to handle spatial data with large numbers of areal units, thus complicating technical decision making in the development of online choropleth mapping applications. To fill this knowledge gap, this dissertation study conducts a scalability evaluation of three tile-based methods discussed in the literature: raster, scalable vector graphics (SVG), and HTML5 Canvas. For the evaluation, the study develops two test applications, generates map tiles from five different boundaries of the United States, and measures the response times of the applications under multiple test operations. While specific to the experimental setups of the study, the evaluation results show that the raster method scales better across various types of user interaction than the other methods. Empirical evidence also points to the superior scalability of Canvas to SVG in dynamic rendering of vector tiles, but not necessarily for partial updates of the tiles. These findings indicate that the raster method is better suited for dynamic choropleth rendering from large areal data, while Canvas would be more suitable than SVG when such rendering frequently involves complete updates of vector shapes.

Date Created
2013
Agent

Essays on space-time interaction tests

151878-Thumbnail Image.png
Description
Researchers across a variety of fields are often interested in determining if data are of a random nature or if they exhibit patterning which may be the result of some alternative and potentially more interesting process. This dissertation explores a

Researchers across a variety of fields are often interested in determining if data are of a random nature or if they exhibit patterning which may be the result of some alternative and potentially more interesting process. This dissertation explores a family of statistical methods, i.e. space-time interaction tests, designed to detect structure within three-dimensional event data. These tests, widely employed in the fields of spatial epidemiology, criminology, ecology and beyond, are used to identify synergistic interaction across the spatial and temporal dimensions of a series of events. Exploration is needed to better understand these methods and determine how their results may be affected by data quality problems commonly encountered in their implementation; specifically, how inaccuracy and/or uncertainty in the input data analyzed by the methods may impact subsequent results. Additionally, known shortcomings of the methods must be ameliorated. The contributions of this dissertation are twofold: it develops a more complete understanding of how input data quality problems impact the results of a number of global and local tests of space-time interaction and it formulates an improved version of one global test which accounts for the previously identified problem of population shift bias. A series of simulation experiments reveal the global tests of space-time interaction explored here to be dramatically affected by the aforementioned deficiencies in the quality of the input data. It is shown that in some cases, a conservative degree of these common data problems can completely obscure evidence of space-time interaction and in others create it where it does not exist. Conversely, a local metric of space-time interaction examined here demonstrates a surprising robustness in the face of these same deficiencies. This local metric is revealed to be only minimally affected by the inaccuracies and incompleteness introduced in these experiments. Finally, enhancements to one of the global tests are presented which solve the problem of population shift bias associated with the test and better contextualize and visualize its results, thereby enhancing its utility for practitioners.
Date Created
2013
Agent

Addressing geographic uncertainty in spatial optimization

151538-Thumbnail Image.png
Description
There exist many facets of error and uncertainty in digital spatial information. As error or uncertainty will not likely ever be completely eliminated, a better understanding of its impacts is necessary. Spatial analytical approaches, in particular, must somehow address data

There exist many facets of error and uncertainty in digital spatial information. As error or uncertainty will not likely ever be completely eliminated, a better understanding of its impacts is necessary. Spatial analytical approaches, in particular, must somehow address data quality issues. This can range from evaluating impacts of potential data uncertainty in planning processes that make use of methods to devising methods that explicitly account for error/uncertainty. To date, little has been done to structure methods accounting for error. This research focuses on developing methods to address geographic data uncertainty in spatial optimization. An integrated approach that characterizes uncertainty impacts by constructing and solving a new multi-objective model that explicitly incorporates facets of data uncertainty is developed. Empirical findings illustrate that the proposed approaches can be applied to evaluate the impacts of data uncertainty with statistical confidence, which moves beyond popular practices of simulating errors in data. Spatial uncertainty impacts are evaluated in two contexts: harvest scheduling and sex offender residency. Owing to the integration of spatial uncertainty, the detailed multi-objective models are more complex and computationally challenging to solve. As a result, a new multi-objective evolutionary algorithm is developed to address the computational challenges posed. The proposed algorithm incorporates problem-specific spatial knowledge to significantly enhance the capability of the evolutionary algorithm for solving the model.  
Date Created
2013
Agent

Spatiotemporal data mining, analysis, and visualization of human activity data

151349-Thumbnail Image.png
Description
This dissertation addresses the research challenge of developing efficient new methods for discovering useful patterns and knowledge in large volumes of electronically collected spatiotemporal activity data. I propose to analyze three types of such spatiotemporal activity data in a methodological

This dissertation addresses the research challenge of developing efficient new methods for discovering useful patterns and knowledge in large volumes of electronically collected spatiotemporal activity data. I propose to analyze three types of such spatiotemporal activity data in a methodological framework that integrates spatial analysis, data mining, machine learning, and geovisualization techniques. Three different types of spatiotemporal activity data were collected through different data collection approaches: (1) crowd sourced geo-tagged digital photos, representing people's travel activity, were retrieved from the website Panoramio.com through information retrieval techniques; (2) the same techniques were used to crawl crowd sourced GPS trajectory data and related metadata of their daily activities from the website OpenStreetMap.org; and finally (3) preschool children's daily activities and interactions tagged with time and geographical location were collected with a novel TabletPC-based behavioral coding system. The proposed methodology is applied to these data to (1) automatically recommend optimal multi-day and multi-stay travel itineraries for travelers based on discovered attractions from geo-tagged photos, (2) automatically detect movement types of unknown moving objects from GPS trajectories, and (3) explore dynamic social and socio-spatial patterns of preschool children's behavior from both geographic and social perspectives.
Date Created
2012
Agent

The centralization index as a measure of local spatial segregation

151109-Thumbnail Image.png
Description
Decades ago in the U.S., clear lines delineated which neighborhoods were acceptable for certain people and which were not. Techniques such as steering and biased mortgage practices continue to perpetuate a segregated outcome for many residents. In contrast, ethnic enclaves

Decades ago in the U.S., clear lines delineated which neighborhoods were acceptable for certain people and which were not. Techniques such as steering and biased mortgage practices continue to perpetuate a segregated outcome for many residents. In contrast, ethnic enclaves and age restricted communities are viewed as voluntary segregation based on cultural and social amenities. This diversity surrounding the causes of segregation are not just region-wide characteristics, but can vary within a region. Local segregation analysis aims to uncover this local variation, and hence open the door to policy solutions not visible at the global scale. The centralization index, originally introduced as a global measure of segregation focused on spatial concentration of two population groups relative a region's urban center, has lost relevancy in recent decades as regions have become polycentric, and the index's magnitude is sensitive to the particular point chosen as the center. These attributes, which make it a poor global measure, are leveraged here to repurpose the index as a local measure. The index's ability to differentiate minority from majority segregation, and its focus on a particular location within a region make it an ideal local segregation index. Based on the local centralization index for two groups, a local multigroup variation is defined, and a local space-time redistribution index is presented capturing change in concentration of a single population group over two time periods. Permutation based inference approaches are used to test the statistical significance of measured index values. Applications to the Phoenix, Arizona metropolitan area show persistent cores of black and white segregation over the years 1990, 2000 and 2010, and a trend of white segregated neighborhoods increasing at a faster rate than black. An analysis of the Phoenix area's recently opened light rail system shows that its 28 stations are located in areas of significant white, black and Hispanic segregation, and there is a clear concentration of renters over owners around most stations. There is little indication of statistically significant change in segregation or population concentration around the stations, indicating a lack of near term impact of light rail on the region's overall demographics.
Date Created
2012
Agent