In a pursuit-evasion setup where one group of agents tracks down another adversarial group, vision-based algorithms have been known to make use of techniques such as Linear Dynamic Estimation to determine the probable future location of an evader in a…
In a pursuit-evasion setup where one group of agents tracks down another adversarial group, vision-based algorithms have been known to make use of techniques such as Linear Dynamic Estimation to determine the probable future location of an evader in a given environment. This helps a pursuer attain an edge over the evader that has conventionally benefited from the uncertainty of the pursuit. The pursuer can utilize this knowledge to enable a faster capture of the evader, as opposed to a pursuer that only knows the evader's current location. Inspired by the function of dorsal anterior cingulate cortex (dACC) neurons in natural predators, the use of a predictive model that is built using an encoder-decoder Long Short-Term Memory (LSTM) Network and can produce a more accurate estimate of the evader's future location is proposed. This enables an even quicker capture of a target when compared to previously used filtering-based methods. The effectiveness of the approach is evaluated by setting up these agents in an environment based in the Modular Open Robots Simulation Engine (MORSE). Cross-domain adaptability of the method, without the explicit need to retrain the prediction model is demonstrated by evaluating it in another domain.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
In videos that contain actions performed unintentionally, agents do not achieve their desired goals. In such videos, it is challenging for computer vision systems to understand high-level concepts such as goal-directed behavior. On the other hand, from a very early…
In videos that contain actions performed unintentionally, agents do not achieve their desired goals. In such videos, it is challenging for computer vision systems to understand high-level concepts such as goal-directed behavior. On the other hand, from a very early age, humans are able to understand the relation between an agent and their ultimate goal even if the action gets disrupted or unintentional effects occur. Inculcating this ability in artificially intelligent agents would make them better social learners by not just learning from their own mistakes, i.e, reinforcement learning, but also learning from other's mistakes. For example, this could greatly reduce the search space for artificially intelligent agents for finding the correct action sequence when trying to achieve a new goal, since they would be able to learn from others what not to do as well as how/when actions result in undesired outcomes.To validate this ability of deep learning models to perform this task, the Weakly Augmented Oops (W-Oops) dataset is proposed, built upon the Oops dataset. W-Oops consists of 2,100 unintentional human action videos, with 44 goal-directed and 33 unintentional video-level activity labels collected through human annotations.
Inspired by previous methods on tasks such as weakly supervised action localization which show promise for achieving good localization results without ground truth segment annotations, this paper proposes a weakly supervised algorithm for localizing the goal-directed as well as the unintentional temporal region of a video using only video-level labels. In particular, an attention mechanism based strategy is employed that predicts the temporal regions which contributes the most to a classification task, leveraging solely video-level labels. Meanwhile, our designed overlap regularization allows the model to focus on distinct portions of the video for inferring the goal-directed and unintentional activity, while guaranteeing their temporal ordering. Extensive quantitative experiments verify the validity of our localization method.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
A massive volume of data is generated at an unprecedented rate in the information age. The growth of data significantly exceeds the computing and storage capacities of the existing digital infrastructure. In the past decade, many methods are invented for…
A massive volume of data is generated at an unprecedented rate in the information age. The growth of data significantly exceeds the computing and storage capacities of the existing digital infrastructure. In the past decade, many methods are invented for data compression, compressive sensing and reconstruction, and compressed learning (learning directly upon compressed data) to overcome the data-explosion challenge. While prior works are predominantly model-based, focus on small models, and not suitable for task-oriented sensing or hardware acceleration, the number of available models for compression-related tasks has escalated by orders of magnitude in the past decade. Motivated by this significant growth and the success of big data, this dissertation proposes to revolutionize both the compressive sensing reconstruction (CSR) and compressed learning (CL) methods from the data-driven perspective. In this dissertation, a series of topics on data-driven CSR are discussed. Individual data-driven models are proposed for the CSR of bio-signals, images, and videos with improved compression ratio and recovery fidelity trade-off. Specifically, a scalable Laplacian pyramid reconstructive adversarial network (LAPRAN) is proposed for single-image CSR. LAPRAN progressively reconstructs images following the concept of the Laplacian pyramid through the concatenation of multiple reconstructive adversarial networks (RANs). For the CSR of videos, CSVideoNet is proposed to improve the spatial-temporal resolution of reconstructed videos. Apart from CSR, data-driven CL is discussed in the dissertation. A CL framework is proposed to extract features directly from compressed data for image classification, objection detection, and semantic/instance segmentation. Besides, the spectral bias of neural networks is analyzed from the frequency perspective, leading to a learning-based frequency selection method for identifying the trivial frequency components which can be removed without accuracy loss. Compared with the conventional spatial downsampling approaches, the proposed frequency-domain learning method can achieve higher accuracy with reduced input data size. The methodologies proposed in this dissertation are not restricted to the above-mentioned applications. The dissertation also discusses other potential applications and directions for future research.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Currently, autonomous vehicles are being evaluated by how well they interact with humans without evaluating how well humans interact with them. Since people are not going to unanimously switch over to using autonomous vehicles, attention must be given to how…
Currently, autonomous vehicles are being evaluated by how well they interact with humans without evaluating how well humans interact with them. Since people are not going to unanimously switch over to using autonomous vehicles, attention must be given to how well these new vehicles signal intent to human drivers from the driver’s point of view. Ineffective communication will lead to unnecessary discomfort among drivers caused by an underlying uncertainty about what an autonomous vehicle is or isn’t about to do. Recent studies suggest that humans tend to fixate on areas of higher uncertainty so scenarios that have a higher number of vehicle fixations can be reasoned to be more uncertain. We provide a framework for measuring human uncertainty and use the framework to measure the effect of empathetic vs non-empathetic agents. We used a simulated driving environment to create recorded scenarios and manipulate the autonomous vehicle to include either an empathetic or non-empathetic agent. The driving interaction is composed of two vehicles approaching an uncontrolled intersection. These scenarios were played to twelve participants while their gaze was recorded to track what the participants were fixating on. The overall intent was to provide an analytical framework as a tool for evaluating autonomous driving features; and in this case, we choose to evaluate how effective it was for vehicles to have empathetic behaviors included in the autonomous vehicle decision making. A t-test analysis of the gaze indicated that empathy did not in fact reduce uncertainty although additional testing of this hypothesis will be needed due to the small sample size.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
A complex social system, whether artificial or natural, can possess its macroscopic properties as a collective, which may change in real time as a result of local behavioral interactions among a number of agents in it. If a reliable indicator…
A complex social system, whether artificial or natural, can possess its macroscopic properties as a collective, which may change in real time as a result of local behavioral interactions among a number of agents in it. If a reliable indicator is available to abstract the macrolevel states, decision makers could use it to take a proactive action, whenever needed, in order for the entire system to avoid unacceptable states or con-verge to desired ones. In realistic scenarios, however, there can be many challenges in learning a model of dynamic global states from interactions of agents, such as 1) high complexity of the system itself, 2) absence of holistic perception, 3) variability of group size, 4) biased observations on state space, and 5) identification of salient behavioral cues. In this dissertation, I introduce useful applications of macrostate estimation in complex multi-agent systems and explore effective deep learning frameworks to ad-dress the inherited challenges. First of all, Remote Teammate Localization (ReTLo)is developed in multi-robot teams, in which an individual robot can use its local interactions with a nearby robot as an information channel to estimate the holistic view of the group. Within the problem, I will show (a) learning a model of a modular team can generalize to all others to gain the global awareness of the team of variable sizes, and (b) active interactions are necessary to diversify training data and speed up the overall learning process. The complexity of the next focal system escalates to a colony of over 50 individual ants undergoing 18-day social stabilization since a chaotic event. I will utilize this natural platform to demonstrate, in contrast to (b), (c)monotonic samples only from “before chaos” can be sufficient to model the panicked society, and (d) the model can also be used to discover salient behaviors to precisely predict macrostates.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
I present my work on a scalable and programmable I/O controller for region-based computing, which will be used in a rhythmic pixel-based camera pipeline. I provide a breakdown of the development and design of the I/O controller and how it…
I present my work on a scalable and programmable I/O controller for region-based computing, which will be used in a rhythmic pixel-based camera pipeline. I provide a breakdown of the development and design of the I/O controller and how it fits in to rhythmic pixel regions, along with a studyon memory traffic of rhythmic pixel regions and how this translates to energy efficiency. This rhythmic pixel region-based camera pipeline has been jointly developed through Dr. Robert LiKamWa’s research lab. High spatiotemporal resolutions allow high precision for vision applications, such as for detecting features for augmented reality or face detection. High spatiotemporal resolution also comes with high memory throughput, leading to higher energy usage. This creates a tradeoff between high precision and energy efficiency, which becomes more important in mobile systems. In addition, not all pixels in a frame are necessary for the vision application, such as pixels that make up the background. Rhythmic pixel regions aim to reduce the tradeoff by creating a pipeline that allows an application developer to specify regions to capture at a non-uniform spatiotemporal resolution. This is accomplished by encoding the incoming image, and only sending the pixels within these specified regions. Later these encoded representations will be decoded to a standard frame representation usable by traditional vision applications. My contribution to this effort has been the design, testing and evaluation of the I/O controller.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Many real-world planning problems can be modeled as Markov Decision Processes (MDPs) which provide a framework for handling uncertainty in outcomes of action executions. A solution to such a planning problem is a policy that handles possible contingencies that could…
Many real-world planning problems can be modeled as Markov Decision Processes (MDPs) which provide a framework for handling uncertainty in outcomes of action executions. A solution to such a planning problem is a policy that handles possible contingencies that could arise during execution. MDP solvers typically construct policies for a problem instance without re-using information from previously solved instances. Research in generalized planning has demonstrated the utility of constructing algorithm-like plans that reuse such information. However, using such techniques in an MDP setting has not been adequately explored.
This thesis presents a novel approach for learning generalized partial policies that can be used to solve problems with different object names and/or object quantities using very few example policies for learning. This approach uses abstraction for state representation, which allows the identification of patterns in solutions such as loops that are agnostic to problem-specific properties. This thesis also presents some theoretical results related to the uniqueness and succinctness of the policies computed using such a representation. The presented algorithm can be used as fast, yet greedy and incomplete method for policy computation while falling back to a complete policy search algorithm when needed. Extensive empirical evaluation on discrete MDP benchmarks shows that this approach generalizes effectively and is often able to solve problems much faster than existing state-of-art discrete MDP solvers. Finally, the practical applicability of this approach is demonstrated by incorporating it in an anytime stochastic task and motion planning framework to successfully construct free-standing tower structures using Keva planks.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Image super-resolution (SR) is a low-level image processing task, which has manyapplications such as medical imaging, satellite image processing, and video enhancement, etc. Given a low resolution image, it aims to reconstruct a high resolution image. The problem is ill-posed since there…
Image super-resolution (SR) is a low-level image processing task, which has manyapplications such as medical imaging, satellite image processing, and video enhancement, etc. Given a low resolution image, it aims to reconstruct a high resolution image. The problem is ill-posed since there can be more than one high resolution image corresponding to the same low-resolution image. To address this problem, a number of machine learning-based approaches have been proposed. In this dissertation, I present my works on single image super-resolution (SISR) and accelerated magnetic resonance imaging (MRI) (a.k.a. super-resolution on MR images), followed by the investigation on transfer learning for accelerated MRI reconstruction. For the SISR, a dictionary-based approach and two reconstruction based approaches are presented. To be precise, a convex dictionary learning (CDL) algorithm is proposed by constraining the dictionary atoms to be formed by nonnegative linear combination of the training data, which is a natural, desired property. Also, two reconstruction-based single methods are presented, which make use of (i)the joint regularization, where a group-residual-based regularization (GRR) and a ridge-regression-based regularization (3R) are combined; (ii)the collaborative representation and non-local self-similarity. After that, two deep learning approaches are proposed, aiming at reconstructing high-quality images from accelerated MRI acquisition. Residual Dense Block (RDB) and feedback connection are introduced in the proposed models. In the last chapter, the feasibility of transfer learning for accelerated MRI reconstruction is discussed.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
This work solves the problem of incorrect rotations while using handheld devices.Two new methods which improve upon previous works are explored. The first method uses an infrared camera to capture and detect the user’s face position and orient the display accordingly. The…
This work solves the problem of incorrect rotations while using handheld devices.Two new methods which improve upon previous works are explored. The first method uses an infrared camera to capture and detect the user’s face position and orient the display accordingly. The second method utilizes gyroscopic and accelerometer data as input to a machine learning model to classify correct and incorrect rotations. Experiments show that these new methods achieve an overall success rate of 67% for the first and 92% for the second which reaches a new high for this performance category. The paper also discusses logistical and legal reasons for implementing this feature into an end-user product from a business perspective. Lastly, the monetary incentive behind a feature like irRotate in a consumer device and explore related patents is discussed.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
The need for incorporating game engines into robotics tools becomes increasingly crucial as their graphics continue to become more photorealistic. This thesis presents a simulation framework, referred to as OpenUAV, that addresses cloud simulation and photorealism challenges in academic and…
The need for incorporating game engines into robotics tools becomes increasingly crucial as their graphics continue to become more photorealistic. This thesis presents a simulation framework, referred to as OpenUAV, that addresses cloud simulation and photorealism challenges in academic and research goals. In this work, OpenUAV is used to create a simulation of an autonomous underwater vehicle (AUV) closely following a moving autonomous surface vehicle (ASV) in an underwater coral reef environment. It incorporates the Unity3D game engine and the robotics software Gazebo to take advantage of Unity3D's perception and Gazebo's physics simulation. The software is developed as a containerized solution that is deployable on cloud and on-premise systems.
This method of utilizing Gazebo's physics and Unity3D perception is evaluated for a team of marine vehicles (an AUV and an ASV) in a coral reef environment. A coordinated navigation and localization module is presented that allows the AUV to follow the path of the ASV. A fiducial marker underneath the ASV facilitates pose estimation of the AUV, and the pose estimates are filtered using the known dynamical system model of both vehicles for better localization. This thesis also investigates different fiducial markers and their detection rates in this Unity3D underwater environment. The limitations and capabilities of this Unity3D perception and Gazebo physics approach are examined.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)