Shrivastava, Aviral

R^2IM: Reliable and Robust Intersection Manager Robust to Rogue Vehicles

Description

At modern-day intersections, traffic lights and stop signs assist human drivers to cross the intersection safely. Traffic congestion in urban road networks is a costly problem that affects all major cities. Efficiently operating intersections is largely dependent on accuracy and precision of human drivers, engendering a lingering uncertainty of attaining safety and high throughput. To improve the efficiency of the existing traffic network and mitigate the effects of human error in the intersection, many studies have proposed autonomous, intelligent transportation systems. These studies often involve utilizing connected autonomous vehicles, implementing a supervisory system, or both. Implementing a supervisory system is relatively more popular due to the security concerns of vehicle-to-vehicle communication. Even though supervisory systems are a step in the right direction for security, many supervisory systems’ safe operation solely relies on the promise of connected data being correct, making system reliability difficult to achieve. To increase fault-tolerance and decrease the effects of position uncertainty, this thesis proposes the Reliable and Robust Intersection Manager, a supervisory system that uses a separate surveillance system to dependably detect vehicles present in the intersection in order to create data redundancy for more accurate scheduling of connected autonomous vehicles. Adding the Surveillance System ensures that the temporal safety buffers between arrival times of connected autonomous vehicles are maintained. This guarantees that connected autonomous vehicles can traverse the intersection safely in the event of large vehicle controller error, a single rogue car entering the intersection, or a sybil attack. To test the proposed system given these fault-models, MATLAB® was used to create simulations in order to observe the functionality of R2IM compared to the state-of-the-art supervisory system, Robust Intersection Manager. Though R2IM is less efficient than the Robust Intersection Manager, it considers more fault models. The Robust Intersection Manager failed to maintain safety in the event of large vehicle controller errors and rogue cars, however R2IM resulted in zero collisions.

Date Created

2019

Agent

Author (aut): Dedinsky, Rachel
Thesis advisor (ths): Shrivastava, Aviral
Committee member: Sen, Arunabha
Committee member: Syrotiuk, Violet
Publisher (pbl): Arizona State University

Implementation of Graph Kernels on Multi core Architecture

Description

Graphs are one of the key data structures for many real-world computing applica-

tions such as machine learning, social networks, genomics etc. The main challenges of

graph processing include diculty in parallelizing the workload that results in work-

load imbalance, poor memory locality and very large number of memory accesses.

This causes large-scale graph processing to be very expensive.

This thesis presents implementation of a select set of graph kernels on a multi-core

architecture, Transmuter. The kernels are Breadth-First Search (BFS), Page Rank

(PR), and Single Source Shortest Path (SSSP). Transmuter is a multi-tiled architec-

ture with 4 tiles and 16 general processing elements (GPE) per tile that supports a

two level cache hierarchy. All graph processing kernels have been implemented on

Transmuter using Gem5 architectural simulator.

The key pre-processing steps in improving the performance are static partition-

ing by destination and balancing the workload among the processing cores. Results

obtained by processing graphs that are partitioned against un-partitioned graphs

show almost 3x improvement in performance. Choice of data structure also plays an

important role in the amount of storage space consumed and the amount of synchro-

nization required in a parallel implementation. Here the compressed sparse column

data format was used. BFS and SSSP are frontier-based algorithms where a frontier

represents a subset of vertices that are active during the current iteration. They

were implemented using the Boolean frontier array data structure. PR is an iterative

algorithm where all vertices are active at all times.

The performance of the dierent Transmuter implementations for the 14nm node

were evaluated based on metrics such as power consumption (Watt), Giga Operations

Per Second(GOPS), GOPS/Watt and L1/L2 cache misses. GOPS/W numbers for

graphs with 10k nodes and 10k edges is 33 for BFS, 477 for PR and 10 for SSSP.

i

Frontier-based algorithms have much lower GOPS/W compared to iterative algo-

rithms such as PR. This is because all nodes in Page Rank are active at all points

in time. For all three kernel implementations, the L1 cache miss rates are quite low

while the L2 cache hit rates are high.

Date Created

2019

Agent

Author (aut): RENGANATHAN, SRINIDHI
Thesis advisor (ths): Chakrabarti, Chaitali
Committee member: Shrivastava, Aviral
Committee member: Mudge, Trevor
Publisher (pbl): Arizona State University

Application-aware Performance Optimization for Software Managed Manycore Architectures

Description

One of the main goals of computer architecture design is to improve performance without much increase in the power consumption. It cannot be achieved by adding increasingly complex intelligent schemes in the hardware, since they will become increasingly less power-efficient. Therefore, parallelism comes up as the solution. In fact, the irrevocable trend of computer design in near future is still to keep increasing the number of cores while reducing the operating frequency. However, it is not easy to scale number of cores. One important challenge is that existing cores consume too much power. Another challenge is that cache-based memory hierarchy poses a serious limitation due to the rapidly increasing demand of area and power for coherence maintenance.

In this dissertation, opportunities to resolve the aforementioned issues were explored in two aspects.

Firstly, the possibility of removing hardware cache altogether, and replacing it with scratchpad memory with software management was explored. Scratchpad memory consumes much less power than caches. However, as data management logic is completely shifted to Software, how to reduce software overhead is challenging. This thesis presents techniques to manage scratchpad memory judiciously by exploiting application semantics and knowledge of data access patterns, thereby enabling optimization of data movement across the memory hierarchy. Experimental results show that the optimization was able to reduce stack data management overhead by 13X, produce better code mapping in more than 80% of the case, and improve performance by 83% in heap management.

Secondly, the possibility of using software branch hinting to replace hardware branch prediction to completely eliminate power consumption on corresponding hardware components was explored. As branch predictor is removed from hardware, software logic is responsible for reducing branch penalty. Techniques to minimize the branch penalty by optimizing branch hint placement were proposed, which can reduce branch penalty by 35.4% over the state-of-the-art.

Date Created

2019

Agent

Author (aut): Lu, Jing
Thesis advisor (ths): Shrivastava, Aviral
Committee member: Sarjoughian, Hessam S.
Committee member: Wu, Carole-Jean
Committee member: Doupe, Adam
Publisher (pbl): Arizona State University

Search-based Test Generation for Automated Driving Systems: From Perception to Control Logic

Description

Automated driving systems are in an intensive research and development stage, and the companies developing these systems are targeting to deploy them on public roads in a very near future. Guaranteeing safe operation of these systems is crucial as they are planned to carry passengers and share the road with other vehicles and pedestrians. Yet, there is no agreed-upon approach on how and in what detail those systems should be tested. Different organizations have different testing approaches, and one common approach is to combine simulation-based testing with real-world driving.

One of the expectations from fully-automated vehicles is never to cause an accident. However, an automated vehicle may not be able to avoid all collisions, e.g., the collisions caused by other road occupants. Hence, it is important for the system designers to understand the boundary case scenarios where an autonomous vehicle can no longer avoid a collision. Besides safety, there are other expectations from automated vehicles such as comfortable driving and minimal fuel consumption. All safety and functional expectations from an automated driving system should be captured with a set of system requirements. It is challenging to create requirements that are unambiguous and usable for the design, testing, and evaluation of automated driving systems. Another challenge is to define useful metrics for assessing the testing quality because in general, it is impossible to test every possible scenario.

The goal of this dissertation is to formalize the theory for testing automated vehicles. Various methods for automatic test generation for automated-driving systems in simulation environments are presented and compared. The contributions presented in this dissertation include (i) new metrics that can be used to discover the boundary cases between safe and unsafe driving conditions, (ii) a new approach that combines combinatorial testing and optimization-guided test generation methods, (iii) approaches that utilize global optimization methods and random exploration to generate critical vehicle and pedestrian trajectories for testing purposes, (iv) a publicly-available simulation-based automated vehicle testing framework that enables application of the existing testing approaches in the literature, including the new approaches presented in this dissertation.

Date Created

2019

Agent

Author (aut): Tuncali, Cumhur Erkan
Thesis advisor (ths): Fainekos, Georgios
Committee member: Ben Amor, Heni
Committee member: Kapinski, James
Committee member: Shrivastava, Aviral
Publisher (pbl): Arizona State University

An IoT Solution to Air Quality Monitoring

Description

Pollution is an increasing problem around the world, and one of the main forms it takes is air pollution. Air pollution, from oxides and dioxides to particulate matter, continues to contribute to millions of deaths each year, which is more than the next three leading causes of environment-related death combined. Plus, the problem is only growing as industrial plants, factories, and transportation continues to rapidly increase across the globe. Those most affected include less developed countries and individuals with pre-existing respiratory conditions. Although many citizens know about this issue, it is often unclear what times and locations are worst in terms of pollutant concentration as it can vary on the time of day, local activity, and other variable factors. As a result, citizens lack the knowledge and resources to properly combat or avoid air pollution, as well as the data and evidence to support any sort of regulatory change. Many companies and organizations have tried to address this through Air Quality Indexes (AQIs) but are not focused enough to help the everyday citizen, and often fail to include many significant pollutants. Thus, we sought to address this issue in a cost-effective way through creating a network of IoT (Internet of Things) devices and deploying them in a select area of Tempe, Arizona. We utilized Arduino Microprocessors and Wireless Radio Frequency Transceivers to send and receive air pollution data in real time. Then, displayed this data in such a way that it could be released to the public via web or mobile app. Furthermore, the product is cheap enough to be reproduced and sold in bulk as well as scaled and customized to be compatible with dozens of different air quality sensors.

Date Created

2019-05

Agent

Co-author: Coury, Abrahm Philip
Co-author: Gillespie, Cody
Thesis director: Ren, Fengbo
Committee member: Shrivastava, Aviral
Contributor (ctb): Computer Science and Engineering Program
Contributor (ctb): Computer Science and Engineering Program
Contributor (ctb): Barrett, The Honors College

Concurrent Checkpointing for Embedded Real-Time Systems

Description

The Internet of Things ecosystem has spawned a wide variety of embedded real-time systems that complicate the identification and resolution of bugs in software. The methods of concurrent checkpoint provide a means to monitor the application state with the ability to replay the execution on like hardware and software, without holding off and delaying the execution of application threads. In this thesis, it is accomplished by monitoring physical memory of the application using a soft-dirty page tracker and measuring the various types of overhead when employing concurrent checkpointing. The solution presented is an advancement of the Checkpoint and Replay In Userspace (CRIU) thereby eliminating the large stalls and parasitic operation for each successive checkpoint. Impact and performance is measured using the Parsec 3.0 Benchmark suite and 4.11.12-rt16+ Linux kernel on a MinnowBoard Turbot Quad-Core board.

Date Created

2018

Agent

Author (aut): Prinke, Michael L
Thesis advisor (ths): Lee, Yann-Hang
Committee member: Shrivastava, Aviral
Committee member: Zhao, Ming
Publisher (pbl): Arizona State University

Software Techniques For Dependable Execution

Description

Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has signiﬁcantly increased the domain and the number

of safety-critical applications – application with unacceptable consequences of failure. Software-level error resilience schemes are attractive because they can provide commercial-oﬀ-the-shelf microprocessors with adaptive and scalable reliability.

Among all software-level error resilience solutions, in-application instruction replication based approaches have been widely used and are deemed to be the most effective. However, existing instruction-based replication schemes only protect some part of computations i.e. arithmetic and logical instructions and leave the rest as unprotected. To improve the eﬃcacy of instruction-level redundancy-based approaches, we developed several error detection and error correction schemes. nZDC (near Zero silent

Data Corruption) is an instruction duplication scheme which protects the execution of whole application. Rather than detecting errors on register operands of memory and control ﬂow operations, nZDC checks the results of such operations. nZDC en

sures the correct execution of memory write instruction by reloading stored value and checking it against redundantly computed value. nZDC also introduces a novel control ﬂow checking mechanism which replicates compare and branch instructions and

detects both wrong direction branches as well as unwanted jumps. Fault injection experiments show that nZDC can improve the error coverage of the state-of-the-art schemes by more than 10x, without incurring any more performance penalty. Further

more, we introduced two error recovery solutions. InCheck is our backward recovery solution which makes light-weighted error-free checkpoints at the basic block granularity. In the case of error, InCheck reverts the program execution to the beginning of last executed basic block and resumes the execution by the aid of preserved in formation. NEMESIS is our forward recovery scheme which runs three versions of computation and detects errors by checking the results of all memory write and branch

operations. In the case of a mismatch, NEMESIS diagnosis routine decides if the error is recoverable. If yes, NEMESIS recovery routine reverts the eﬀect of error from the program state and resumes program normal execution from the error detection

point.

Date Created

2018

Agent

Author (aut): Didehban, Moslem
Thesis advisor (ths): Shrivastava, Aviral
Committee member: Wu, Carole-Jean
Committee member: Clark, Lawrence
Committee member: Mahlke, Scott
Publisher (pbl): Arizona State University

Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors

Description

General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions.

Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in prior proposals, often lead to sub-optimal system performance. I demonstrate that in order to maximize system performance, it is essential to manage the LLCs while being cognizant of its interaction with the system main memory. I propose ReMAP, which reduces the net memory access cost by evicting cache lines that either have no reuse, or have low memory access cost. ReMAP improves the performance of the CMP system by as much as 13%, and by an average of 6.5%.

Rather than the LLC, the L1 data cache has a pronounced impact on GPGPU performance by acting as the bandwidth filter for the rest of the memory subsystem. Prior work has shown that the severely constrained data cache capacity in GPGPUs leads to sub-optimal performance. In this thesis, I propose two novel techniques that address the GPGPU data cache capacity problem. I propose ID-Cache that performs effective cache bypassing and cache line size selection to improve cache capacity utilization. Next, I propose LATTE-CC that considers the GPU’s latency tolerance feature and adaptively compresses the data stored in the data cache, thereby increasing its effective capacity. ID-Cache and LATTE-CC are shown to achieve 71% and 19.2% speedup, respectively, over a wide variety of GPGPU applications.

Complementing the aforementioned microarchitecture techniques, I identify the need for system architecture innovations to sustain performance scalability of GPG- PUs in the face of slowing Moore’s Law. I propose a novel GPU architecture called the Multi-Chip-Module GPU (MCM-GPU) that integrates multiple GPU modules to form a single logical GPU. With intelligent memory subsystem optimizations tailored for MCM-GPUs, it can achieve within 7% of the performance of a similar but hypothetical monolithic die GPU. Taking a step further, I present an in-depth study of the energy-efficiency characteristics of future MCM-GPUs. I demonstrate that the inherent non-uniform memory access side-effects form the key energy-efficiency bottleneck in the future.

In summary, this thesis offers key insights into the performance and energy-efficiency bottlenecks in CMPs and GPGPUs, which can guide future architects towards developing high-performance and energy-efficient general-purpose processors.

Date Created

2018

Agent

Author (aut): Arunkumar, Akhil
Thesis advisor (ths): Wu, Carole-Jean
Committee member: Shrivastava, Aviral
Committee member: Lee, Yann-Hang
Committee member: Bolotin, Evgeny
Publisher (pbl): Arizona State University

Using an Open-Source Solution to Implement a Drone Cyber-Physical System

Description

The goal of this project is to use an open-source solution to implement a drone Cyber-Physical System that can fly autonomously and accurately. The proof-of-concept to analyze the drone's flight capabilities is to fly in a pattern corresponding to the outline of an image, a process that requires both stability and precision to accurately depict the image. In this project, we found that building a Cyber-Physical System is difficult because of the tedious and complex nature of designing and testing the hardware and software solutions of this system. Furthermore, we reflect on the difficulties that arose from using open-source hardware and software.

Date Created

2018-05

Agent

Co-author: Dedinsky, Rachel
Co-author: Lubbers, Harrison James
Thesis director: Shrivastava, Aviral
Committee member: Dougherty, Ryan
Contributor (ctb): Computer Science and Engineering Program
Contributor (ctb): Barrett, The Honors College

Hybrid Multiresolution Simulation & Model Checking: Network-On-Chip Systems

Description

Designers employ a variety of modeling theories and methodologies to create functional models of discrete network systems. These dynamical models are evaluated using verification and validation techniques throughout incremental design stages. Models created for these systems should directly represent their growing complexity with respect to composition and heterogeneity. Similar to software engineering practices, incremental model design is required for complex system design. As a result, models at early increments are significantly simpler relative to real systems. While experimenting (verification or validation) on models at early increments are computationally less demanding, the results of these experiments are less trustworthy and less rewarding. At any increment of design, a set of tools and technique are required for controlling the complexity of models and experimentation.

A complex system such as Network-on-Chip (NoC) may benefit from incremental design stages. Current design methods for NoC rely on multiple models developed using various modeling frameworks. It is useful to develop frameworks that can formalize the relationships among these models. Fine-grain models are derived using their coarse-grain counterparts. Moreover, validation and verification capability at various design stages enabled through disciplined model conversion is very beneficial.

In this research, Multiresolution Modeling (MRM) is used for system level design of NoC. MRM aids in creating a family of models at different levels of scale and complexity with well-formed relationships. In addition, a variant of the Discrete Event System Specification (DEVS) formalism is proposed which supports model checking. Hierarchical models of Network-on-Chip components may be created at different resolutions while each model can be validated using discrete-event simulation and verified via state exploration. System property expressions are defined in the DEVS language and developed as Transducers which can be applied seamlessly for model checking and simulation purposes.

Multiresolution Modeling with verification and validation capabilities of this framework complement one another. MRM manages the scale and complexity of models which in turn can reduces V&V time and effort and conversely the V&V helps ensure correctness of models at multiple resolutions. This framework is realized through extending the DEVS-Suite simulator and its applicability demonstrated for exemplar NoC models.

Date Created

2017

Agent

Author (aut): Gholami, Soroosh
Thesis advisor (ths): Sarjoughian, Hessam S.
Committee member: Fainekos, Georgios
Committee member: Ogras, Umit Y.
Committee member: Shrivastava, Aviral
Publisher (pbl): Arizona State University

Subscribe to Shrivastava, Aviral