Generating Vocabulary Sets for Implicit Language Learning using Masked Language Modeling

158310-Thumbnail Image.png
Globalization is driving a rapid increase in motivation for learning new languages, with online and mobile language learning applications being an extremely popular method of doing so. Many language learning applications focus almost exclusively on aiding students in acquiring vocabulary,

Globalization is driving a rapid increase in motivation for learning new languages, with online and mobile language learning applications being an extremely popular method of doing so. Many language learning applications focus almost exclusively on aiding students in acquiring vocabulary, one of the most important elements in achieving fluency in a language. A well-balanced language curriculum must include both explicit vocabulary instruction and implicit vocabulary learning through interaction with authentic language materials. However, most language learning applications focus only on explicit instruction, providing little support for implicit learning. Students require support with implicit vocabulary learning because they need enough context to guess and acquire new words. Traditional techniques aim to teach students enough vocabulary to comprehend the text, thus enabling them to acquire new words. Despite the wide variety of support for vocabulary learning offered by learning applications today, few offer guidance on how to select an optimal vocabulary study set.

This thesis proposes a novel method of student modeling which uses pre-trained masked language models to model a student's reading comprehension abilities and detect words which are required for comprehension of a text. It explores the efficacy of using pre-trained masked language models to model human reading comprehension and presents a vocabulary study set generation pipeline using this method. This pipeline creates vocabulary study sets for explicit language learning that enable comprehension while still leaving some words to be acquired implicitly. Promising results show that masked language modeling can be used to model human comprehension and that the pipeline produces reasonably sized vocabulary study sets.
Date Created

Domain-Agnostic Context-Aware Assistant Framework for Task-Based Environment

158297-Thumbnail Image.png
Smart home assistants are becoming a norm due to their ease-of-use. They employ spoken language as an interface, facilitating easy interaction with their users. Even with their obvious advantages, natural-language based interfaces are not prevalent outside the domain of home

Smart home assistants are becoming a norm due to their ease-of-use. They employ spoken language as an interface, facilitating easy interaction with their users. Even with their obvious advantages, natural-language based interfaces are not prevalent outside the domain of home assistants. It is hard to adopt them for computer-controlled systems due to the numerous complexities involved with their implementation in varying fields. The main challenge is the grounding of natural language base terms into the underlying system's primitives. The existing systems that do use natural language interfaces are specific to one problem domain only.

In this thesis, a domain-agnostic framework that creates natural language interfaces for computer-controlled systems has been developed by making the mapping between the language constructs and the system primitives customizable. The framework employs ontologies built using OWL (Web Ontology Language) for knowledge representation purposes and machine learning models for language processing tasks. It has been evaluated within a simulation environment consisting of objects and a robot. This environment has been deployed as a web application, providing anonymous user testing for evaluation, and generating training data for machine learning components. Performance evaluation has been done on metrics such as time taken for a task or the number of instructions given by the user to the robot to accomplish a task. Additionally, the framework has been used to create a natural language interface for a database system to demonstrate its domain independence.
Date Created

Ensemble Learning on Deep Neural Networks for Image Caption Generation

157371-Thumbnail Image.png
Capturing the information in an image into a natural language sentence is

considered a difficult problem to be solved by computers. Image captioning involves not just detecting objects from images but understanding the interactions between the objects to be translated into

Capturing the information in an image into a natural language sentence is

considered a difficult problem to be solved by computers. Image captioning involves not just detecting objects from images but understanding the interactions between the objects to be translated into relevant captions. So, expertise in the fields of computer vision paired with natural language processing are supposed to be crucial for this purpose. The sequence to sequence modelling strategy of deep neural networks is the traditional approach to generate a sequential list of words which are combined to represent the image. But these models suffer from the problem of high variance by not being able to generalize well on the training data.

The main focus of this thesis is to reduce the variance factor which will help in generating better captions. To achieve this, Ensemble Learning techniques have been explored, which have the reputation of solving the high variance problem that occurs in machine learning algorithms. Three different ensemble techniques namely, k-fold ensemble, bootstrap aggregation ensemble and boosting ensemble have been evaluated in this thesis. For each of these techniques, three output combination approaches have been analyzed. Extensive experiments have been conducted on the Flickr8k dataset which has a collection of 8000 images and 5 different captions for every image. The bleu score performance metric, which is considered to be the standard for evaluating natural language processing (NLP) problems, is used to evaluate the predictions. Based on this metric, the analysis shows that ensemble learning performs significantly better and generates more meaningful captions compared to any of the individual models used.
Date Created

UVLabel A Tool for the Future of Interferometry Analysis

157365-Thumbnail Image.png
UVLabel was created to enable radio astronomers to view and annotate their own data such that they could then expand their future research paths. It simplifies their data rendering process by providing a simple user interface to better access sections

UVLabel was created to enable radio astronomers to view and annotate their own data such that they could then expand their future research paths. It simplifies their data rendering process by providing a simple user interface to better access sections of their data. Furthermore, it provides an interface to track trends in their data through a labelling feature.

The tool was developed following the incremental development process in order to quickly create a functional and testable tool. The incremental process also allowed for feedback from radio astronomers to help guide the project's development.

UVLabel provides both a functional product, and a modifiable and scalable code base for radio astronomer developers. This enables astronomers studying various astronomical interferometric data labelling capabilities. The tool can then be used to improve their filtering methods, pursue machine learning solutions, and discover new trends. Finally, UVLabel will be open source to put customization, scalability, and adaptability in the hands of these researchers.
Date Created

Graph Search as a Feature in Imperative/Procedural Programming Languages

156331-Thumbnail Image.png
Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and efficient, search over that graph.

To facilitate rapid, correct, efficient, and intuitive development of graph based solutions we propose a new programming language construct - the search statement. Given a supra-root node, a procedure which determines the children of a given parent node, and optional definitions of the fail-fast acceptance or rejection of a solution, the search statement can conduct a search over any graph or network. Structurally, this statement is modelled after the common switch statement and is put into a largely imperative/procedural context to allow for immediate and intuitive development by most programmers. The Go programming language has been used as a foundation and proof-of-concept of the search statement. A Go compiler is provided which implements this construct.
Date Created

An Extendable Python IoT Server and Task Handler

133464-Thumbnail Image.png
The Internet of Things (IoT) is term used to refer to the billions of Internet connected, embedded devices that communicate with one another with the purpose of sharing data or performing actions. One of the core usages of the proverbial

The Internet of Things (IoT) is term used to refer to the billions of Internet connected, embedded devices that communicate with one another with the purpose of sharing data or performing actions. One of the core usages of the proverbial network is the ability for its devices and services to interact with one another to automate daily tasks and routines. For example, IoT devices are often used to automate tasks within the household, such as turning the lights on/off or starting the coffee pot. However, designing a modular system to create and schedule these routines is a difficult task.

Current IoT integration utilities attempt to help simplify this task, but most fail to satisfy one of the requirements many users want in such a system ‒ simplified integration with third party devices. This project seeks to solve this issue through the creation of an easily extendable, modular integrating utility. It is open-source and does not require the use of a cloud-based server, with users hosting the server themselves. With a server and data controller implemented in pure Python and a library for embedded ESP8266 microcontroller-powered devices, the solution seeks to satisfy both casual users as well as those interested in developing their own integrations.
Date Created

ReL GoalD (Reinforcement Learning for Goal Dependencies)

133880-Thumbnail Image.png
In this project, the use of deep neural networks for the process of selecting actions to execute within an environment to achieve a goal is explored. Scenarios like this are common in crafting based games such as Terraria or Minecraft.

In this project, the use of deep neural networks for the process of selecting actions to execute within an environment to achieve a goal is explored. Scenarios like this are common in crafting based games such as Terraria or Minecraft. Goals in these environments have recursive sub-goal dependencies which form a dependency tree. An agent operating within these environments have access to low amounts of data about the environment before interacting with it, so it is crucial that this agent is able to effectively utilize a tree of dependencies and its environmental surroundings to make judgements about which sub-goals are most efficient to pursue at any point in time. A successful agent aims to minimizes cost when completing a given goal. A deep neural network in combination with Q-learning techniques was employed to act as the agent in this environment. This agent consistently performed better than agents using alternate models (models that used dependency tree heuristics or human-like approaches to make sub-goal oriented choices), with an average performance advantage of 33.86% (with a standard deviation of 14.69%) over the best alternate agent. This shows that machine learning techniques can be consistently employed to make goal-oriented choices within an environment with recursive sub-goal dependencies and low amounts of pre-known information.
Date Created