Haptic Vision: Augmenting Non-visual Travel Tools, Techniques, and Methods by Increasing Spatial Knowledge Through Dynamic Haptic Interactions

158792-Thumbnail Image.png
Description
Access to real-time situational information including the relative position and motion of surrounding objects is critical for safe and independent travel. Object or obstacle (OO) detection at a distance is primarily a task of the visual system due to the

Access to real-time situational information including the relative position and motion of surrounding objects is critical for safe and independent travel. Object or obstacle (OO) detection at a distance is primarily a task of the visual system due to the high resolution information the eyes are able to receive from afar. As a sensory organ in particular, the eyes have an unparalleled ability to adjust to varying degrees of light, color, and distance. Therefore, in the case of a non-visual traveler, someone who is blind or low vision, access to visual information is unattainable if it is positioned beyond the reach of the preferred mobility device or outside the path of travel. Although, the area of assistive technology in terms of electronic travel aids (ETA’s) has received considerable attention over the last two decades; surprisingly, the field has seen little work in the area focused on augmenting rather than replacing current non-visual travel techniques, methods, and tools. Consequently, this work describes the design of an intuitive tactile language and series of wearable tactile interfaces (the Haptic Chair, HaptWrap, and HapBack) to deliver real-time spatiotemporal data. The overall intuitiveness of the haptic mappings conveyed through the tactile interfaces are evaluated using a combination of absolute identification accuracy of a series of patterns and subjective feedback through post-experiment surveys. Two types of spatiotemporal representations are considered: static patterns representing object location at a single time instance, and dynamic patterns, added in the HaptWrap, which represent object movement over a time interval. Results support the viability of multi-dimensional haptics applied to the body to yield an intuitive understanding of dynamic interactions occurring around the navigator during travel. Lastly, it is important to point out that the guiding principle of this work centered on providing the navigator with spatial knowledge otherwise unattainable through current mobility techniques, methods, and tools, thus, providing the \emph{navigator} with the information necessary to make informed navigation decisions independently, at a distance.
Date Created
2020
Agent

Hana: An Open-Domain Chatbot Application for Language Learning

130936-Thumbnail Image.png
Description
Learning a new language can be very challenging. One significant aspect of learning a language is learning how to have fluent verbal and written conversations with other people in that language. However, it can be difficult to find other people

Learning a new language can be very challenging. One significant aspect of learning a language is learning how to have fluent verbal and written conversations with other people in that language. However, it can be difficult to find other people available with whom to practice conversations. Additionally, total beginners may feel uncomfortable and self-conscious when speaking the language with others. In this paper, I present Hana, a chatbot application powered by deep learning for practicing open-domain verbal and written conversations in a variety of different languages. Hana uses a pre-trained medium-sized instance of Microsoft's DialoGPT in order to generate English responses to user input translated into English. Google Cloud Platform's Translation API is used to handle translation to and from the language selected by the user. The chatbot is presented in the form of a browser-based web application, allowing users to interact with the chatbot in both a verbal or text-based manner. Overall, the chatbot is capable of having interesting open-domain conversations with the user in languages supported by the Google Cloud Translation API, but response generation can be delayed by several seconds, and the conversations and their translations do not necessarily take into account linguistic and cultural nuances associated with a given language.
Date Created
2020-12

Anticipatory and Invisible Interfaces to Address Impaired Proprioception in Neurological Disorders

158436-Thumbnail Image.png
Description
The burden of adaptation has been a major limiting factor in the adoption rates of new wearable assistive technologies. This burden has created a necessity for the exploration and combination of two key concepts in the development of upcoming wearables:

The burden of adaptation has been a major limiting factor in the adoption rates of new wearable assistive technologies. This burden has created a necessity for the exploration and combination of two key concepts in the development of upcoming wearables: anticipation and invisibility. The combination of these two topics has created the field of Anticipatory and Invisible Interfaces (AII)

In this dissertation, a novel framework is introduced for the development of anticipatory devices that augment the proprioceptive system in individuals with neurodegenerative disorders in a seamless way that scaffolds off of existing cognitive feedback models. The framework suggests three main categories of consideration in the development of devices which are anticipatory and invisible:

• Idiosyncratic Design: How do can a design encapsulate the unique characteristics of the individual in the design of assistive aids?

• Adaptation to Intrapersonal Variations: As individuals progress through the various stages of a disability
eurological disorder, how can the technology adapt thresholds for feedback over time to address these shifts in ability?

• Context Aware Invisibility: How can the mechanisms of interaction be modified in order to reduce cognitive load?

The concepts proposed in this framework can be generalized to a broad range of domains; however, there are two primary applications for this work: rehabilitation and assistive aids. In preliminary studies, the framework is applied in the areas of Parkinsonian freezing of gait anticipation and the anticipation of body non-compliance during rehabilitative exercise.
Date Created
2020
Agent

Generalized Domain Adaptation for Visual Domains

158278-Thumbnail Image.png
Description
Humans have a great ability to recognize objects in different environments irrespective of their variations. However, the same does not apply to machine learning models which are unable to generalize to images of objects from different domains. The generalization of

Humans have a great ability to recognize objects in different environments irrespective of their variations. However, the same does not apply to machine learning models which are unable to generalize to images of objects from different domains. The generalization of these models to new data is constrained by the domain gap. Many factors such as image background, image resolution, color, camera perspective and variations in the objects are responsible for the domain gap between the training data (source domain) and testing data (target domain). Domain adaptation algorithms aim to overcome the domain gap between the source and target domains and learn robust models that can perform well across both the domains.

This thesis provides solutions for the standard problem of unsupervised domain adaptation (UDA) and the more generic problem of generalized domain adaptation (GDA). The contributions of this thesis are as follows. (1) Certain and Consistent Domain Adaptation model for closed-set unsupervised domain adaptation by aligning the features of the source and target domain using deep neural networks. (2) A multi-adversarial deep learning model for generalized domain adaptation. (3) A gating model that detects out-of-distribution samples for generalized domain adaptation.

The models were tested across multiple computer vision datasets for domain adaptation.

The dissertation concludes with a discussion on the proposed approaches and future directions for research in closed set and generalized domain adaptation.
Date Created
2020
Agent

Incremental Learning With Sample Generation From Pretrained Networks

158259-Thumbnail Image.png
Description
In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction

In the last decade deep learning based models have revolutionized machine learning and computer vision applications. However, these models are data-hungry and training them is a time-consuming process. In addition, when deep neural networks are updated to augment their prediction space with new data, they run into the problem of catastrophic forgetting, where the model forgets previously learned knowledge as it overfits to the newly available data. Incremental learning algorithms enable deep neural networks to prevent catastrophic forgetting by retaining knowledge of previously observed data while also learning from newly available data.

This thesis presents three models for incremental learning; (i) Design of an algorithm for generative incremental learning using a pre-trained deep neural network classifier; (ii) Development of a hashing based clustering algorithm for efficient incremental learning; (iii) Design of a student-teacher coupled neural network to distill knowledge for incremental learning. The proposed algorithms were evaluated using popular vision datasets for classification tasks. The thesis concludes with a discussion about the feasibility of using these techniques to transfer information between networks and also for incremental learning applications.
Date Created
2020
Agent

"I'm Having Trouble Understanding You Right Now": A Multi-Dimensional Evaluation of the Intelligibility of Dysphonic Speech

158233-Thumbnail Image.png
Description
Individuals with voice disorders experience challenges communicating daily. These challenges lead to a significant decrease in the quality of life for individuals with dysphonia. While voice amplification systems are often employed as a voice-assistive technology, individuals with voice disorders generally

Individuals with voice disorders experience challenges communicating daily. These challenges lead to a significant decrease in the quality of life for individuals with dysphonia. While voice amplification systems are often employed as a voice-assistive technology, individuals with voice disorders generally still experience difficulties being understood while using voice amplification systems. With the goal of developing systems that help improve the quality of life of individuals with dysphonia, this work outlines the landscape of voice-assistive technology, the inaccessibility of state-of-the-art voice-based technology and the need for the development of intelligibility improving voice-assistive technologies designed both with and for individuals with voice disorders. With the rise of voice-based technologies in society, in order for everyone to participate in the use of voice-based technologies individuals with voice disorders must be included in both the data that is used to train these systems and the design process. An important and necessary step towards the development of better voice assistive technology as well as more inclusive voice-based systems is the creation of a large, publicly available dataset of dysphonic speech. To this end, a web-based platform to crowdsource voice disorder speech was developed to create such a dataset. This dataset will be released so that it is freely and publicly available to stimulate research in the field of voice-assistive technologies. Future work includes building a robust intelligibility estimation model, as well as employing that model to measure, and therefore enhance, the intelligibility of a given utterance. The hope is that this model will lead to the development of voice-assistive technology using state-of-the-art machine learning models to help individuals with voice disorders be better understood.
Date Created
2020
Agent

Neural Network Architecture with External Memory and Domain-aware Weight Switching Mechanism

158180-Thumbnail Image.png
Description
Humans have an excellent ability to analyze and process information from multiple domains. They also possess the ability to apply the same decision-making process when the situation is familiar with their previous experience.

Inspired by human's ability to remember past

Humans have an excellent ability to analyze and process information from multiple domains. They also possess the ability to apply the same decision-making process when the situation is familiar with their previous experience.

Inspired by human's ability to remember past experiences and apply the same when a similar situation occurs, the research community has attempted to augment memory with Neural Network to store the previously learned information. Together with this, the community has also developed mechanisms to perform domain-specific weight switching to handle multiple domains using a single model. Notably, the two research fields work independently, and the goal of this dissertation is to combine their capabilities.

This dissertation introduces a Neural Network module augmented with two external memories, one allowing the network to read and write the information and another to perform domain-specific weight switching. Two learning tasks are proposed in this work to investigate the model performance - solving mathematics operations sequence and action based on color sequence identification. A wide range of experiments with these two tasks verify the model's learning capabilities.
Date Created
2020
Agent

Accessible Retail Shopping For The Visually Impaired Using Deep Learning

158127-Thumbnail Image.png
Description
Over the past decade, advancements in neural networks have been instrumental in achieving remarkable breakthroughs in the field of computer vision. One of the applications is in creating assistive technology to improve the lives of visually impaired people by making

Over the past decade, advancements in neural networks have been instrumental in achieving remarkable breakthroughs in the field of computer vision. One of the applications is in creating assistive technology to improve the lives of visually impaired people by making the world around them more accessible. A lot of research in convolutional neural networks has led to human-level performance in different vision tasks including image classification, object detection, instance segmentation, semantic segmentation, panoptic segmentation and scene text recognition. All the before mentioned tasks, individually or in combination, have been used to create assistive technologies to improve accessibility for the blind.

This dissertation outlines various applications to improve accessibility and independence for visually impaired people during shopping by helping them identify products in retail stores. The dissertation includes the following contributions; (i) A dataset containing images of breakfast-cereal products and a classifier using a deep neural (ResNet) network; (ii) A dataset for training a text detection and scene-text recognition model; (iii) A model for text detection and scene-text recognition to identify product images using a user-controlled camera; (iv) A dataset of twenty thousand products with product information and related images that can be used to train and test a system designed to identify products.
Date Created
2020
Agent

Language Image Transformer

158120-Thumbnail Image.png
Description
Humans perceive the environment using multiple modalities like vision, speech (language), touch, taste, and smell. The knowledge obtained from one modality usually complements the other. Learning through several modalities helps in constructing an accurate model of the environment. Most of

Humans perceive the environment using multiple modalities like vision, speech (language), touch, taste, and smell. The knowledge obtained from one modality usually complements the other. Learning through several modalities helps in constructing an accurate model of the environment. Most of the current vision and language models are modality-specific and, in many cases, extensively use deep-learning based attention mechanisms for learning powerful representations. This work discusses the role of attention in associating vision and language for generating shared representation. Language Image Transformer (LIT) is proposed for learning multi-modal representations of the environment. It uses a training objective based on Contrastive Predictive Coding (CPC) to maximize the Mutual Information (MI) between the visual and linguistic representations. It learns the relationship between the modalities using the proposed cross-modal attention layers. It is trained and evaluated using captioning datasets, MS COCO, and Conceptual Captions. The results and the analysis offers a perspective on the use of Mutual Information Maximisation (MIM) for generating generalizable representations across multiple modalities.
Date Created
2020
Agent

Zero Shot Learning for Visual Object Recognition with Generative Models

158117-Thumbnail Image.png
Description
Visual object recognition has achieved great success with advancements in deep learning technologies. Notably, the existing recognition models have gained human-level performance on many of the recognition tasks. However, these models are data hungry, and their performance is constrained by

Visual object recognition has achieved great success with advancements in deep learning technologies. Notably, the existing recognition models have gained human-level performance on many of the recognition tasks. However, these models are data hungry, and their performance is constrained by the amount of training data. Inspired by the human ability to recognize object categories based on textual descriptions of objects and previous visual knowledge, the research community has extensively pursued the area of zero-shot learning. In this area of research, machine vision models are trained to recognize object categories that are not observed during the training process. Zero-shot learning models leverage textual information to transfer visual knowledge from seen object categories in order to recognize unseen object categories.

Generative models have recently gained popularity as they synthesize unseen visual features and convert zero-shot learning into a classical supervised learning problem. These generative models are trained using seen classes and are expected to implicitly transfer the knowledge from seen to unseen classes. However, their performance is stymied by overfitting towards seen classes, which leads to substandard performance in generalized zero-shot learning. To address this concern, this dissertation proposes a novel generative model that leverages the semantic relationship between seen and unseen categories and explicitly performs knowledge transfer from seen categories to unseen categories. Experiments were conducted on several benchmark datasets to demonstrate the efficacy of the proposed model for both zero-shot learning and generalized zero-shot learning. The dissertation also provides a unique Student-Teacher based generative model for zero-shot learning and concludes with future research directions in this area.
Date Created
2020
Agent