Towards understanding natural language: semantic parsing, commonsense knowledge acquisition, reasoning framework and applications

157602-Thumbnail Image.png
Description
Reasoning with commonsense knowledge is an integral component of human behavior. It is due to this capability that people know that a weak person may not be able to lift someone. It has been a long standing goal of the

Reasoning with commonsense knowledge is an integral component of human behavior. It is due to this capability that people know that a weak person may not be able to lift someone. It has been a long standing goal of the Artificial Intelligence community to simulate such commonsense reasoning abilities in machines. Over the years, many advances have been made and various challenges have been proposed to test their abilities. The Winograd Schema Challenge (WSC) is one such Natural Language Understanding (NLU) task which was also proposed as an alternative to the Turing Test. It is made up of textual question answering problems which require resolution of a pronoun to its correct antecedent.

In this thesis, two approaches of developing NLU systems to solve the Winograd Schema Challenge are demonstrated. To this end, a semantic parser is presented, various kinds of commonsense knowledge are identified, techniques to extract commonsense knowledge are developed and two commonsense reasoning algorithms are presented. The usefulness of the developed tools and techniques is shown by applying them to solve the challenge.
Date Created
2019
Agent

Explainable Fact Checking by Combining Automated Rule Discovery with Probabilistic Answer Set Programming

156602-Thumbnail Image.png
Description
The goal of fact checking is to determine if a given claim holds. A promising ap- proach for this task is to exploit reference information in the form of knowledge graphs (KGs), a structured and formal representation of knowledge with

The goal of fact checking is to determine if a given claim holds. A promising ap- proach for this task is to exploit reference information in the form of knowledge graphs (KGs), a structured and formal representation of knowledge with semantic descriptions of entities and relations. KGs are successfully used in multiple appli- cations, but the information stored in a KG is inevitably incomplete. In order to address the incompleteness problem, this thesis proposes a new method built on top of recent results in logical rule discovery in KGs called RuDik and a probabilistic extension of answer set programs called LPMLN.

This thesis presents the integration of RuDik which discovers logical rules over a given KG and LPMLN to do probabilistic inference to validate a fact. While automatically discovered rules over a KG are for human selection and revision, they can be turned into LPMLN programs with a minor modification. Leveraging the probabilistic inference in LPMLN, it is possible to (i) derive new information which is not explicitly stored in a KG with a probability associated with it, and (ii) provide supporting facts and rules for interpretable explanations for such decisions.

Also, this thesis presents experiments and results to show that this approach can label claims with high precision. The evaluation of the system also sheds light on the role played by the quality of the given rules and the quality of the KG.
Date Created
2018
Agent

A Framework for Interactive Geospatial Map Cleaning using GPS Trajectories

155987-Thumbnail Image.png
Description
A volunteered geographic information system, e.g., OpenStreetMap (OSM), collects data from volunteers to generate geospatial maps. To keep the map consistent, volunteers are expected to perform the tedious task of updating the underlying geospatial data at regular intervals. Such a

A volunteered geographic information system, e.g., OpenStreetMap (OSM), collects data from volunteers to generate geospatial maps. To keep the map consistent, volunteers are expected to perform the tedious task of updating the underlying geospatial data at regular intervals. Such a map curation step takes time and considerable human effort. In this thesis, we propose a framework that improves the process of updating geospatial maps by automatically identifying road changes from user-generated GPS traces. Since GPS traces can be sparse and noisy, the proposed framework validates the map changes with the users before propagating them to a publishable version of the map. The proposed framework achieves up to four times faster map matching performance than the state-of-the-art algorithms with only 0.1-0.3% accuracy loss.
Date Created
2017
Agent

A Biased Topic Modeling Approach for Case Control Study from Health Related Social Media Postings

155912-Thumbnail Image.png
Description
Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social

Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with scores which indicate the presence of ADR being generated. A case control study has been performed on a data set of twitter timelines of women that announced their pregnancy, the goals of the study is to compare the ADR risk of medication usage from each medication category during the pregnancy.

In addition, to evaluate the prediction power of this approach, another important aspect of personalized medicine was addressed: the prediction of medication usage through the identification of risk groups. During the prediction process, the health information from Twitter timeline, such as diseases, symptoms, treatments, effects, and etc., is summarized by the topic modelling processes and the summarization results is used for prediction. Dimension reduction and topic similarity measurement are integrated into this framework for timeline classification and prediction. This work could be applied to provide guidelines for FDA drug risk categories. Currently, this process is done based on laboratory results and reported cases.

Finally, a multi-dimensional text data warehouse (MTD) to manage the output from the topic modelling is proposed. Some attempts have been also made to incorporate topic structure (ontology) and the MTD hierarchy. Results demonstrate that proposed methods show promise and this system represents a low-cost approach for drug safety early warning.
Date Created
2017
Agent

A Benchmark for Automated Fact Checking with Knowledge Bases

134154-Thumbnail Image.png
Description
The need for automated / computational fact checking has grown substantially in recent times due to the high volume of false information and limited workforce of human fact checkers. This need has spawned research and new developments in this field

The need for automated / computational fact checking has grown substantially in recent times due to the high volume of false information and limited workforce of human fact checkers. This need has spawned research and new developments in this field and has created many different systems and approaches to this complex problem. This paper attempts to not just explain the most popular methods that are currently being used, but provide experimental results of the comparison of two different systems, the replication of results from their respective papers, and an annotated data-set of different test sentences to be used in these systems.
Date Created
2017-12
Agent

SPSR efficient processing of socially k-nearest neighbors with spatial range filter

154864-Thumbnail Image.png
Description
Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks

Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users. And with rapid increase in the usage of mobile phones and wearables, social media data is being tied to spatial networks. This research document proposes an efficient technique that answers socially k-Nearest Neighbors with Spatial Range Filter. The proposed approach performs a joint search on both the social and spatial domains which radically improves the performance compared to straight forward solutions. The research document proposes a novel index that combines social and spatial indexes. In other words, graph data is stored in an organized manner to filter it based on spatial (region of interest) and social constraints (top-k closest vertices) at query time. That leads to pruning necessary paths during the social graph traversal procedure, and only returns the top-K social close venues. The research document then experimentally proves how the proposed approach outperforms existing baseline approaches by at least three times and also compare how each of our algorithms perform under various conditions on a real geo-social dataset extracted from Yelp.
Date Created
2016
Agent