Roundabout Dilemma Zone Detection with Trajectory Forecasting

189366-Thumbnail Image.png
Description
In recent years, there has been a growing emphasis on developing automated systems to enhance traffic safety, particularly in the detection of dilemma zones (DZ) at intersections. This study focuses on the automated detection of DZs at roundabouts using trajectory

In recent years, there has been a growing emphasis on developing automated systems to enhance traffic safety, particularly in the detection of dilemma zones (DZ) at intersections. This study focuses on the automated detection of DZs at roundabouts using trajectory forecasting, presenting an advanced system with perception capabilities. The system utilizes a modular, graph-structured recurrent model that predicts the trajectories of various agents, accounting for agent dynamics and incorporating heterogeneous data such as semantic maps. This enables the system to facilitate traffic management decision-making and improve overall intersection safety. To assess the system's performance, a real-world dataset of traffic roundabout intersections was employed. The experimental results demonstrate that our Superpowered Trajectron++ system exhibits high accuracy in detecting DZ events, with a false positive rate of approximately 10%. Furthermore, the system has the remarkable ability to anticipate and identify dilemma events before they occur, enabling it to provide timely instructions to vehicles. These instructions serve as guidance, determining whether vehicles should come to a halt or continue moving through the intersection, thereby enhancing safety and minimizing potential conflicts. In summary, the development of automated systems for detecting DZs represents an important advancement in traffic safety. The Superpowered Trajectron++ system, with its trajectory forecasting capabilities and incorporation of diverse data sources, showcases improved accuracy in identifying DZ events and can effectively guide vehicles in making informed decisions at roundabout intersections.
Date Created
2023
Agent

Mining IoT Network Traffic in Smart Homes: Traffic Measurement, Pattern Recognition, and Security Applications

189245-Thumbnail Image.png
Description
Recent advances in cyber-physical systems, artificial intelligence, and cloud computing have driven the widespread deployment of Internet-of-Things (IoT) devices in smart homes. However, the spate of cyber attacks exploiting the vulnerabilities and weak security management of smart home IoT devices

Recent advances in cyber-physical systems, artificial intelligence, and cloud computing have driven the widespread deployment of Internet-of-Things (IoT) devices in smart homes. However, the spate of cyber attacks exploiting the vulnerabilities and weak security management of smart home IoT devices have highlighted the urgency and challenges of designing efficient mechanisms for detecting, analyzing, and mitigating security threats towards them. In this dissertation, I seek to address the security and privacy issues of smart home IoT devices from the perspectives of traffic measurement, pattern recognition, and security applications. I first propose an efficient multidimensional smart home network traffic measurement framework, which enables me to deeply understand the smart home IoT ecosystem and detect various vulnerabilities and flaws. I further design intelligent schemes to efficiently extract security-related IoT device event and user activity patterns from the encrypted smart home network traffic. Based on the knowledge of how smart home operates, different systems for securing smart home networks are proposed and implemented, including abnormal network traffic detection across multiple IoT networking protocol layers, smart home safety monitoring with extracted spatial information about IoT device events, and system-level IoT vulnerability analysis and network hardening.
Date Created
2023
Agent

Towards Understanding the Role of Knowledge in Improving Transformer-based Language Models

189209-Thumbnail Image.png
Description
In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they

In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained on massive curated data, they often need specific extracted knowledge to understand better and reason. This is because often relevant knowledge may be implicit or missing, which hampers machine reasoning. Apart from that, manual knowledge curation is time-consuming and erroneous. Hence, finding fast and effective methods to extract such knowledge from data is important for improving language models. This leads to finding ideal ways to utilize such knowledge by incorporating them into language models. Successful knowledge extraction and integration lead to an important question of knowledge evaluation of such models by developing tools or introducing challenging test suites to learn about their limitations and improve them further. So to improve the transformer-based models, understanding the role of knowledge becomes important. In the pursuit to improve language models with knowledge, in this dissertation I study three broad research directions spanning across the natural language, biomedical and cybersecurity domains: (1) Knowledge Extraction (KX) - How can transformer-based language models be leveraged to extract knowledge from data? (2) Knowledge Integration (KI) - How can such specific knowledge be used to improve such models? (3) Knowledge Evaluation (KE) - How can language models be evaluated for specific skills and understand their limitations? I propose methods to extract explicit textual, implicit structural, missing textual, and missing structural knowledge from natural language and binary programs using transformer-based language models. I develop ways to improve the language model’s multi-step and commonsense reasoning abilities using external knowledge. Finally, I develop challenging datasets which assess their numerical reasoning skills in both in-domain and out-of-domain settings.
Date Created
2023
Agent

Neural Retriever-Reader for Information Retrieval and Question Answering

187694-Thumbnail Image.png
Description
In the era of information explosion and multi-modal data, information retrieval (IR) and question answering (QA) systems have become essential in daily human activities. IR systems aim to find relevant information in response to user queries, while QA systems

In the era of information explosion and multi-modal data, information retrieval (IR) and question answering (QA) systems have become essential in daily human activities. IR systems aim to find relevant information in response to user queries, while QA systems provide concise and accurate answers to user questions. IR and QA are two of the most crucial challenges in the realm of Artificial Intelligence (AI), with wide-ranging real-world applications such as search engines and dialogue systems. This dissertation investigates and develops novel models and training objectives to enhance current retrieval systems in textual and multi-modal contexts. Moreover, it examines QA systems, emphasizing generalization and robustness, and creates new benchmarks to promote their progress. Neural retrievers have surfaced as a viable solution, capable of surpassing the constraints of traditional term-matching search algorithms. This dissertation presents Poly-DPR, an innovative multi-vector model architecture that manages test-query, and ReViz, a comprehensive multimodal model to tackle multi-modality queries. By utilizing IR-focused pretraining tasks and producing large-scale training data, the proposed methodology substantially improves the abilities of existing neural retrievers.Concurrently, this dissertation investigates the realm of QA systems, referred to as ``readers'', by performing an exhaustive analysis of current extractive and generative readers, which results in a reliable guidance for selecting readers for downstream applications. Additionally, an original reader (Two-in-One) is designed to effectively choose the pertinent passages and sentences from a pool of candidates for multi-hop reasoning. This dissertation also acknowledges the significance of logical reasoning in real-world applications and has developed a comprehensive testbed, LogiGLUE, to further the advancement of reasoning capabilities in QA systems.
Date Created
2023
Agent

From SLAM to Spatial AI: Using Implicit 3D Latent Space Landmark Reconstruction for Robot Localization and Mapping

187693-Thumbnail Image.png
Description
Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving

Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving robots without a complete understanding of the 3D spatial world. Advanced information is necessary to address these limitations. Fortunately, recent developments in learning-based 3D reconstruction allow robots to not only detect semantic meanings, but also recognize the 3D structure of objects from a few images. By combining this 3D structural information, SLAM can be improved from a low-level approach to a structure-aware approach. This work propose a novel approach for multi-view 3D reconstruction using recurrent transformer. This approach allows robots to accumulate information from multiple views and encode them into a compact latent space. The resulting latent representations are then decoded to produce 3D structural landmarks, which can be used to improve robot localization and mapping.
Date Created
2023
Agent

Compression and Regularization of Vision Transformers

187635-Thumbnail Image.png
Description
Vision Transformers (ViT) achieve state-of-the-art performance on image classification tasks. However, their massive size makes them unsuitable for edge devices. Unlike CNNs, limited research has been conducted on the compression of ViTs. This thesis work proposes the ”adjoined training technique”

Vision Transformers (ViT) achieve state-of-the-art performance on image classification tasks. However, their massive size makes them unsuitable for edge devices. Unlike CNNs, limited research has been conducted on the compression of ViTs. This thesis work proposes the ”adjoined training technique” to compress any transformer based architecture. The architecture, Adjoined Vision Transformer (AN-ViT), achieves state-of-the-art performance on the ImageNet classification task. With the base network as Swin Transformer, AN-ViT with 4.1× fewer parameters and 5.5× fewer floating point operations (FLOPs) achieves similar accuracy (within 0.15%). This work further proposes Differentiable Adjoined ViT (DAN-ViT), whichuses neural architecture search to find hyper-parameters of our model. DAN-ViT outperforms the current state-of-the-art methods including Swin-Transformers by about ∼ 0.07% and achieves 85.27% top-1 accuracy on the ImageNet dataset while using 2.2× fewer parameters and with 2.2× fewer FLOPs.
Date Created
2023
Agent

Understanding the Effects of Orthogonal Convolution in Transfer Learning for Medical Image Analysis

187633-Thumbnail Image.png
Description
Insufficient training data poses significant challenges to training a deep convolutional neural network (CNN) to solve a target task. One common solution to this problem is to use transfer learning with pre-trained networks to apply knowledge learned from one domain

Insufficient training data poses significant challenges to training a deep convolutional neural network (CNN) to solve a target task. One common solution to this problem is to use transfer learning with pre-trained networks to apply knowledge learned from one domain with sufficient data to a new domain with limited data and avoid training a deep network from scratch. However, for such methods to work in a transfer learning setting, learned features from the source domain need to be generalizable to the target domain, which is not guaranteed since the feature space and distributions of the source and target data may be different. This thesis aims to explore and understand the use of orthogonal convolutional neural networks to improve learning of diverse, generic features that are transferable to a novel task. In this thesis, orthogonal regularization is used to pre-train deep CNNs to investigate if and how orthogonal convolution may improve feature extraction in transfer learning. Experiments using two limited medical image datasets in this thesis suggests that orthogonal regularization improves generality and reduces redundancy of learned features more effectively in certain deep networks for transfer learning. The results on feature selection and classification demonstrate the improvement in transferred features helps select more expressive features that improves generalization performance. To understand the effectiveness of orthogonal regularization on different architectures, this work studies the effects of residual learning on orthogonal convolution. Specifically, this work examines the presence of residual connections and its effects on feature similarities and show residual learning blocks help orthogonal convolution better preserve feature diversity across convolutional layers of a network and alleviate the increase in feature similarities caused by depth, demonstrating the importance of residual learning in making orthogonal convolution more effective.
Date Created
2023
Agent

Towards Development of Models that Learn New Tasks from Instructions

187521-Thumbnail Image.png
Description
Humans have the remarkable ability to solve different tasks by simply reading textual instructions that define the tasks and looking at a few examples. Natural Language Processing (NLP) models built with the conventional machine learning paradigm, however, often struggle to

Humans have the remarkable ability to solve different tasks by simply reading textual instructions that define the tasks and looking at a few examples. Natural Language Processing (NLP) models built with the conventional machine learning paradigm, however, often struggle to generalize across tasks (e.g., a question-answering system cannot solve classification tasks) despite training with lots of examples. A long-standing challenge in Artificial Intelligence (AI) is to build a model that learns a new task by understanding the human-readable instructions that define it. To study this, I led the development of NATURAL INSTRUCTIONS and SUPERNATURAL INSTRUCTIONS, large-scale datasets of diverse tasks, their human-authored instructions, and instances. I adopt generative pre-trained language models to encode task-specific instructions along with input and generate task output. Empirical results in my experiments indicate that the instruction-tuning helps models achieve cross-task generalization. This leads to the question: how to write good instructions? Backed by extensive empirical analysis on large language models, I observe important attributes for successful instructional prompts and propose several reframing techniques for model designers to create such prompts. Empirical results in my experiments show that reframing notably improves few-shot learning performance; this is particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is expensive. In another experiment, I observe that representing a chain of thought instruction of mathematical reasoning questions as a program improves model performance significantly. This observation leads to the development of a large scale mathematical reasoning model BHASKAR and a unified benchmark LILA. In case of program synthesis tasks, however, summarizing a question (instead of expanding as in chain of thought) helps models significantly. This thesis also contains the study of instruction-example equivalence, power of decomposition instruction to replace the need for new models and origination of dataset bias from crowdsourcing instructions to better understand the advantages and disadvantages of instruction paradigm. Finally, I apply the instruction paradigm to match real user needs and introduce a new prompting technique HELP ME THINK to help humans perform various tasks by asking questions.
Date Created
2023
Agent

Vision-inspired Representation and Learning for Data-driven Signal Processing

187459-Thumbnail Image.png
Description
In the era of data explosion, massive data is generated from various sources at an unprecedented speed. The ever-growing amount of data reveals enormous opportunities for developing novel data-driven solutions to unsolved problems. In recent years, benefiting from numerous public

In the era of data explosion, massive data is generated from various sources at an unprecedented speed. The ever-growing amount of data reveals enormous opportunities for developing novel data-driven solutions to unsolved problems. In recent years, benefiting from numerous public datasets and advances in deep learning, data-driven approaches in the computer vision domain have demonstrated superior performance with high adaptability on various data and tasks. Meanwhile, signal processing has long been dominated by techniques derived from rigorous mathematical models built upon prior knowledge of signals. Due to the lack of adaptability to real data and applications, model-based methods often suffer from performance degradation and engineering difficulties. In this dissertation, multiple signal processing problems are studied from vision-inspired data representation and learning perspectives to address the major limitation on adaptability. Corresponding data-driven solutions are proposed to achieve significantly improved performance over conventional solutions. Specifically, in the compressive sensing domain, an open-source image compressive sensing toolbox and benchmark to standardize the implementation and evaluation of reconstruction methods are first proposed. Then a plug-and-play compression ratio adapter is proposed to enable the adaptability of end-to-end data-driven reconstruction methods to variable compression ratios. Lastly, the problem of transfer learning from images to bioelectric signals is experimentally studied to demonstrate the improved performance of data-driven reconstruction. In the image subsampling domain, task-adaptive data-driven image subsampling is studied to reduce data redundancy and retain information of interest simultaneously. In the semiconductor analysis domain, the data-driven automatic error detection problem is studied in the context of integrated circuit segmentation for the first time. In the light detection and ranging(LiDAR) camera calibration domain, the calibration accuracy degradation problem in low-resolution LiDAR scenarios is addressed with data-driven techniques.
Date Created
2023
Agent

Generalizing Under Distribution Shifts and Data Scarcity via Geometrical and Knowledge-Aware Deep Learning

187454-Thumbnail Image.png
Description
This dissertation presents novel solutions for improving the generalization capabilities of deep learning based computer vision models. Neural networks are known to suffer a large drop in performance when tested on samples from a different distribution than the one on

This dissertation presents novel solutions for improving the generalization capabilities of deep learning based computer vision models. Neural networks are known to suffer a large drop in performance when tested on samples from a different distribution than the one on which they were trained. The proposed solutions, based on latent space geometry and meta-learning, address this issue by improving the robustness of these models to distribution shifts. Through the use of geometrical alignment, state-of-the-art domain adaptation and source-free test-time adaptation strategies are developed. Additionally, geometrical alignment can allow classifiers to be progressively adapted to new, unseen test domains without requiring retraining of the feature extractors. The dissertation also presents algorithms for enabling in-the-wild generalization without needing access to any samples from the target domain. Other causes of poor generalization, such as data scarcity in critical applications and training data with high levels of noise and variance, are also explored. To address data scarcity in fine-grained computer vision tasks such as object detection, novel context-aware augmentations are suggested. While the first four chapters focus on general-purpose computer vision models, strategies are also developed to improve robustness in specific applications. The efficiency of training autonomous agents for visual navigation is improved by incorporating semantic knowledge, and the integration of domain experts' knowledge allows for the realization of a low-cost, minimally invasive generalizable automated rehabilitation system. Lastly, new tools for explainability and model introspection using counter-factual explainers trained through interval-based uncertainty calibration objectives are presented.
Date Created
2023
Agent