Eleatic: Secure Architecture Across the Edge-to-Cloud Continuum

171405-Thumbnail Image.png
Description
Many companies face pressure to deploy flexible compute infrastructures to manage their operations. However, the current developments in cloud and edge computing have created a data processing asymmetry challenge. On the edge, workloads frequently require low-latency responses, contend with connectivity

Many companies face pressure to deploy flexible compute infrastructures to manage their operations. However, the current developments in cloud and edge computing have created a data processing asymmetry challenge. On the edge, workloads frequently require low-latency responses, contend with connectivity and bandwidth instabilities, may require privacy guarantees, and may perform under limited or high-variance compute resources. In the cloud, workloads tolerate longer latency, expect highly available infrastructure, access high-performance compute resources, and have more power available, but may be further from where the processing results are needed. This compute asymmetry challenge requires a new computational paradigm. In this work, I advance a new computing architecture model, called the Continuum Computing Architecture (CCA), and validate this model with a candidate architecture. CCA is a unifying edge-fog-cloud computing model that provides the following capabilities: (i) a continuum of compute that spans from network-connected edge devices to the cloud – with very low power consumption to high-performance compute; (ii) same architecture with different micro-architectures along this compute continuum – a single RISC-V instruction set architecture with reconfigurable processing units; (iii) portability across all scales – the same program can be run across the continuum with different latencies and power utilizations; and (iv) secure shared memory features are fully-supported – physical memories along the continuum are abstracted to allow edge and cloud to share data in a transparent fashion. The validating architecture has three micro-architectures. The edge micro-architecture, Parmenides, targets accelerator-based edge processing system-on-chips (SoCs). Parmenides includes security features to protect the SoC in uncontrolled environments while adapting its power usage and processing to ambient events. The fog and cloud micro-architectures, Melissus and Zeno, must support application data distribution across the memory of many compute nodes to achieve the desired scale and performance. As a solution, I introduce the Eleatic Memory Model (EMM): a global shared memory architecture with hardware-supported global memory access permissions. All memory accesses are made with a Namespace-based capability scheme that supports improved scalability and memory security. The CCA model addresses several memory-centric security challenges including the misuse of resources, risk to application and data integrity, as well as concerns over authorization and confidentiality.
Date Created
2022
Agent

EdgeFaaS: A Function-based Framework for Edge Computing

168534-Thumbnail Image.png
Description
The rapid growth of data generated from Internet of Things (IoTs) such as smart phones and smart home devices presents new challenges to cloud computing in transferring, storing, and processing the data. With increasingly more powerful edge devices, edge computing,

The rapid growth of data generated from Internet of Things (IoTs) such as smart phones and smart home devices presents new challenges to cloud computing in transferring, storing, and processing the data. With increasingly more powerful edge devices, edge computing, on the other hand, has the potential to better responsiveness, privacy, and cost efficiency. However, resources across the cloud and edge are highly distributed and highly diverse. To address these challenges, this paper proposes EdgeFaaS, a Function-as-a-Service (FaaS) based computing framework that supports the flexible, convenient, and optimized use of distributed and heterogeneous resources across IoT, edge, and cloud systems. EdgeFaaS allows cluster resources and individual devices to be managed under the same framework and provide computational and storage resources for functions. It provides virtual function and virtual storage interfaces for consistent function management and storage management across heterogeneous compute and storage resources. It automatically optimizes the scheduling of functions and placement of data according to their performance and privacy requirements. EdgeFaaS is evaluated based on two edge workflows: video analytics workflow and federated learning workflow, both of which are representative edge applications and involve large amounts of input data generated from edge devices.
Date Created
2021
Agent

Compiler Design for Accelerating Applications on Coarse-Grained Reconfigurable Architectures

168306-Thumbnail Image.png
Description
Coarse-Grained Reconfigurable Arrays (CGRAs) are emerging accelerators that promise low-power acceleration of compute-intensive loops in applications. The acceleration achieved by CGRA relies on the efficient mapping of the compute-intensive loops by the CGRA compiler onto the CGRA. The CGRA mapping

Coarse-Grained Reconfigurable Arrays (CGRAs) are emerging accelerators that promise low-power acceleration of compute-intensive loops in applications. The acceleration achieved by CGRA relies on the efficient mapping of the compute-intensive loops by the CGRA compiler onto the CGRA. The CGRA mapping problem, being NP-complete, is performed in a two-step process, scheduling, and mapping. The scheduling algorithm allocates timeslots to the nodes of the DFG, and the mapping algorithm maps the scheduled nodes onto the PEs of the CGRA. On a mapping failure, the initiation interval (II) is increased, and a new schedule is obtained for the increased II. Most previous mapping techniques use the Iterative Modulo Scheduling algorithm (IMS) to find a schedule for a given II. Since IMS generates a resource-constrained ASAP (as-soon-as-possible) scheduling, even with increased II, it tends to generate a similar schedule that is not mappable and does not explore the schedule space effectively. The problems encountered by IMS-based scheduling algorithms are explored and an improved randomized scheduling algorithm for scheduling of the application loop to be accelerated is proposed. When encountering a mapping failure for a given schedule, existing mapping algorithms either exit and retry the mapping anew, or recursively remove the previously mapped node to find a valid mapping (backtrack).Abandoning the mapping is extreme, but even backtracking may not be the best choice, since the root of the problem may not be the previous node. The challenges in existing algorithms are systematically analyzed and a failure-aware mapping algorithm is presented. The loops in general-purpose applications are often complicated loops, i.e., loops with perfect and imperfect nests and loops with nested if-then-else's (conditionals). The existing hardware-software solutions to execute branches and conditions are inefficient. A co-design approach that efficiently executes complicated loops on CGRA is proposed. The compiler transforms complex loops, maps them to the CGRA, and lays them out in the memory in a specific manner, such that the hardware can fetch and execute the instructions from the right path at runtime. Finally, a CGRA compilation simulator open-source framework is presented. This open-source CGRA simulation framework is based on LLVM and gem5 to extract the loop, map them onto the CGRA architecture, and execute them as a co-processor to an ARM CPU.
Date Created
2021
Agent

Making a Real-Time Operating System for the Raspberry Pi 2B

164798-Thumbnail Image.png
Description
Real-Time Operating Systems are used in a variety of applications ranging from autonomous vehicles, flight controllers, and energy management systems to pacemakers, satellite tracking systems, amateur robotics and much more. It turns out that while general-purpose computers can perform tasks

Real-Time Operating Systems are used in a variety of applications ranging from autonomous vehicles, flight controllers, and energy management systems to pacemakers, satellite tracking systems, amateur robotics and much more. It turns out that while general-purpose computers can perform tasks quite quickly, the execution time for various processes varies noticeably between different executions. Execution time variation poses a big challenge for many computer-controlled systems that operate in the real-world such as robots, autonomous vehicles, drones, traffic signals, etc. The execution time variation matters in these systems since they must interact in the real world and perform actions at the proper times, and executing these tasks at other times can have varied effects ranging from a minor inconvenience to catastrophic failure. Many of these real-time systems are comprised of single board computers, such as a pacemaker. One single-board computer that is popular among hobbyists due to its form factor, cost, and performance is the Raspberry Pi, which uses an ARM-based processor. In order to provide a Real-Time Operating System for this single board computer this paper presents Jobbed, a single-core Real-Time Operating System which uses a fixed priority preemptive scheduler, targeted at the Raspberry Pi 2B. In this paper, we present the algorithmic structure behind this system and compare it to the Raspbian Operating System in an array of performance and behavioral tests targeted towards proper Real-Time Operating Systems.
Date Created
2022-05
Agent

Formalizing Safety, Perception, and Mission Requirements for Testing and Planning in Autonomous Vehicles

161988-Thumbnail Image.png
Description
Autonomous Vehicles (AV) are inevitable entities in future mobility systems thatdemand safety and adaptability as two critical factors in replacing/assisting human drivers. Safety arises in defining, standardizing, quantifying, and monitoring requirements for all autonomous components. Adaptability, on the other hand, involves efficient handling

Autonomous Vehicles (AV) are inevitable entities in future mobility systems thatdemand safety and adaptability as two critical factors in replacing/assisting human drivers. Safety arises in defining, standardizing, quantifying, and monitoring requirements for all autonomous components. Adaptability, on the other hand, involves efficient handling of uncertainty and inconsistencies in models and data. First, I address safety by presenting a search-based test-case generation framework that can be used in training and testing deep-learning components of AV. Next, to address adaptability, I propose a framework based on multi-valued linear temporal logic syntax and semantics that allows autonomous agents to perform model-checking on systems with uncertainties. The search-based test-case generation framework provides safety assurance guarantees through formalizing and monitoring Responsibility Sensitive Safety (RSS) rules. I use the RSS rules in signal temporal logic as qualification specifications for monitoring and screening the quality of generated test-drive scenarios. Furthermore, to extend the existing temporal-based formal languages’ expressivity, I propose a new spatio-temporal perception logic that enables formalizing qualification specifications for perception systems. All-in-one, my test-generation framework can be used for reasoning about the quality of perception, prediction, and decision-making components in AV. Finally, my efforts resulted in publicly available software. One is an offline monitoring algorithm based on the proposed logic to reason about the quality of perception systems. The other is an optimal planner (model checker) that accepts mission specifications and model descriptions in the form of multi-valued logic and multi-valued sets, respectively. My monitoring framework is distributed with the publicly available S-TaLiRo and Sim-ATAV tools.
Date Created
2021
Agent

A Methodology and Formalism to Handle Timing Uncertainties in Cyber-Physical Systems

161975-Thumbnail Image.png
Description
Uncertainty is intrinsic in Cyber-Physical Systems since they interact with human and work with both analog and digital worlds. Since even minute deviation from the real values can make catastrophe in a safety-critical application, considering uncertainties in CPS behavior

Uncertainty is intrinsic in Cyber-Physical Systems since they interact with human and work with both analog and digital worlds. Since even minute deviation from the real values can make catastrophe in a safety-critical application, considering uncertainties in CPS behavior is essential. On the other side, time is a foundational aspect of Cyber-Physical Systems (CPS). Correct timing of system events is critical to optimize responsiveness to the environment, in terms of timeliness, accuracy, and precision in the knowledge, measurement, prediction, and control of CPS behavior. In order to design a more resilient and reliable CPS, first and foremost, there should be a way to specify the timing constraints that a constructed Cyber-Physical System must meet with considering existing uncertainties. Only then, we can seek systematic approaches to check if all timing constraints are being met, and develop correct-by-construction methodologies. In this regard, Timestamp Temporal Logic (TTL) is developed to specify the timing constraints on a distributed CPS. By TTL designers can specify the timing requirements that a CPS must satisfy in a succinct and intuitive manner and express the tolerable error as a part of the language. The proposed deduction system on TTL (TTL reasoning system) gives the ability to check the consistency among expresses system specifications and simplify them to be implemented on FPGA for run-time verification. Regarding CPS run-time verification, Timestamp-based Monitoring Approach(TMA) has been designed that can hook up to a CPS and take its timing specifications in TTL and verify if the timing constraints are being met with considering existing uncertainties in the system. TMA does not need to compute whether the constraint is being met at each and every instance of time but it re-evaluates constraint only when there is an event that can affect the outcome. This enables it to perform online timing monitoring of CPS for less computation and resources. Furthermore, the minimum design parameters of the timing CPS that are required to enable testing the timing of CPS are defined in this dissertation
Date Created
2021
Agent

Safe and Robust Cooperative Algorithm for Connected Autonomous Vehicles

161806-Thumbnail Image.png
Description
Autonomous Vehicles (AVs) have the potential to significantly evolve transportation. AVs are expected to make transportation safer by avoiding accidents that happen due to human errors. When AVs become connected, they can exchange information with the infrastructure or other Connected

Autonomous Vehicles (AVs) have the potential to significantly evolve transportation. AVs are expected to make transportation safer by avoiding accidents that happen due to human errors. When AVs become connected, they can exchange information with the infrastructure or other Connected Autonomous Vehicles (CAVs) to efficiently plan their future motion and therefore, increase the road throughput and reduce energy consumption. Cooperative algorithms for CAVs will not be deployed in real life unless they are proved to be safe, robust, and resilient to different failure models. Since intersections are crucial areas where most accidents happen, this dissertation first focuses on making existing intersection management algorithms safe and resilient against network and computation time, bounded model mismatches and external disturbances, and the existence of a rogue vehicle. Then, a generic algorithm for conflict resolution and cooperation of CAVs is proposed that ensures the safety of vehicles even when other vehicles suddenly change their plan. The proposed approach can also detect deadlock situations among CAVs and resolve them through a negotiation process. A testbed consisting of 1/10th scale model CAVs is built to evaluate the proposed algorithms. In addition, a simulator is developed to perform tests at a large scale. Results from the conducted experiments indicate the robustness and resilience of proposed approaches.
Date Created
2021
Agent

Dash Database: Structured Kernel Data For The Machine Understanding of Computation

158775-Thumbnail Image.png
Description
As device and voltage scaling cease, ever-increasing performance targets can only be achieved through the design of parallel, heterogeneous architectures. The workloads targeted by these domain-specific architectures must be designed to leverage the strengths of the

As device and voltage scaling cease, ever-increasing performance targets can only be achieved through the design of parallel, heterogeneous architectures. The workloads targeted by these domain-specific architectures must be designed to leverage the strengths of the platform: a task that has proven to be extremely difficult and expensive.
Machine learning has the potential to automate this process by understanding the features of computation that optimize device utilization and throughput.
Unfortunately, applications of this technique have utilized small data-sets and specific feature extraction, limiting the impact of their contributions.

To address this problem I present Dash-Database; a repository of C and C++ programs for software-defined radio applications and its neighboring fields; a methodology for structuring the features of computation using kernels, and a set of evaluation metrics to standardize computation data sets. Dash-Database contributes a general data set that supports machine understanding of computation and standardizes the input corpus utilized for machine learning of computation; currently only a small set of benchmarks and features are being used.
I present an evaluation of Dash-Database using three novel metrics: breadth, depth and richness; and compare its results to a data set largely representative of those used in prior work, indicating a 5x increase in breadth, 40x increase in depth, and a rich set of sample features.
Using Dash-Database, the broader community can work toward a general machine understanding of computation that can automate the design of workloads for domain-specific computation.
Date Created
2020
Agent

Cooperative Driving of Connected Autonomous Vehicles Using Responsibility Sensitive Safety Rules

158517-Thumbnail Image.png
Description
In the recent times, traffic congestion and motor accidents have been a major problem for transportation in major cities. Intelligent Transportation Systems has the potential to be an effective solution in order to tackle this issue. Connected Autonomous Vehicles can

In the recent times, traffic congestion and motor accidents have been a major problem for transportation in major cities. Intelligent Transportation Systems has the potential to be an effective solution in order to tackle this issue. Connected Autonomous Vehicles can cooperate at intersections, ramp merging, lane change and other conflicting scenarios in order to resolve the conflicts and avoid collisions with other vehicles. A lot of works has been proposed for specific scenarios such as intersections, ramp merging or lane change which partially solve the conflict resolution problem. Also, one of the major issues in autonomous decision making - deadlocks have not been considered in some of the works. The existing works either do not consider deadlocks or lack a safety proof. This thesis proposes a cooperative driving solution that provides a complete navigation, conflict resolution and deadlock resolution for connected autonomous vehicles. A graph-based model is used to resolve the deadlocks between vehicles and the responsibility sensitive safety (RSS) rules have been used in order to ensure safety of the autonomous vehicles during conflict detection and resolution. This algorithm provides a complete navigation solution for an autonomous vehicle from its source to destination. The algorithm ensures that accidents do not occur even in the worst-case scenario and the decision making is deadlock free.
Date Created
2020
Agent

Power, Performance, and Energy Management of Heterogeneous Architectures

157773-Thumbnail Image.png
Description
Many core modern multiprocessor systems-on-chip offers tremendous power and performance

optimization opportunities by tuning thousands of potential voltage, frequency

and core configurations. Applications running on these architectures are becoming increasingly

complex. As the basic building blocks, which make up the application, change during

runtime,

Many core modern multiprocessor systems-on-chip offers tremendous power and performance

optimization opportunities by tuning thousands of potential voltage, frequency

and core configurations. Applications running on these architectures are becoming increasingly

complex. As the basic building blocks, which make up the application, change during

runtime, different configurations may become optimal with respect to power, performance

or other metrics. Identifying the optimal configuration at runtime is a daunting task due

to a large number of workloads and configurations. Therefore, there is a strong need to

evaluate the metrics of interest as a function of the supported configurations.

This thesis focuses on two different types of modern multiprocessor systems-on-chip

(SoC): Mobile heterogeneous systems and tile based Intel Xeon Phi architecture.

For mobile heterogeneous systems, this thesis presents a novel methodology that can

accurately instrument different types of applications with specific performance monitoring

calls. These calls provide a rich set of performance statistics at a basic block level while the

application runs on the target platform. The target architecture used for this work (Odroid

XU3) is capable of running at 4940 different frequency and core combinations. With the

help of instrumented application vast amount of characterization data is collected that provides

details about performance, power and CPU state at every instrumented basic block

across 19 different types of applications. The vast amount of data collected has enabled

two runtime schemes. The first work provides a methodology to find optimal configurations

in heterogeneous architecture using classifiers and demonstrates an average increase

of 93%, 81% and 6% in performance per watt compared to the interactive, ondemand and

powersave governors, respectively. The second work using same data shows a novel imitation

learning framework for dynamically controlling the type, number, and the frequencies

of active cores to achieve an average of 109% PPW improvement compared to the default

governors.

This work also presents how to accurately profile tile based Intel Xeon Phi architecture

while training different types of neural networks using open image dataset on deep learning

framework. The data collected allows deep exploratory analysis. It also showcases how

different hardware parameters affect performance of Xeon Phi.
Date Created
2019
Agent