Neuro-Symbolic AI Approaches to Enhance Deep Neural Networks with Logical Reasoning and Knowledge Integration

One of the challenges in Artificial Intelligence (AI) is to integrate fast, automatic, and intuitive System-1 thinking with slow, deliberate, and logical System-2 thinking. While deep learning approaches excel at perception tasks for System-1, their reasoning capabilities for System-2 are limited. Besides, deep learning approaches are usually data-hungry, hard to make use of explicit knowledge, and struggling with interpretability and justification. This dissertation presents three neuro-symbolic AI approaches that integrate neural networks (NNs) with symbolic AI methods to address these issues. The first approach presented in this dissertation is NeurASP, which combines NNs with Answer Set Programming (ASP), a logic programming formalism. NeurASP provides an effective way to integrate sub-symbolic and symbolic computation by treating NN outputs as probability distributions over atomic facts in ASP. The explicit knowledge encoded in ASP corrects mistakes in NN outputs and allows for better training with less data. To avoid NeurASP's bottleneck in symbolic computation, this dissertation presents a Constraint Loss via Straight-Through Estimators (CL-STE). CL-STE provides a systematic way to compile discrete logical constraints into a loss function over discretized NN outputs and scales significantly better than state-of-the-art neuro-symbolic methods. This dissertation also presents a finding when CL-STE was applied to Transformers. Transformers can be extended with recurrence to enhance its power for multi-step reasoning. Such Recurrent Transformer can straightforwardly be applied to visual constraint reasoning problems while successfully addressing the symbol grounding problem. Lastly, this dissertation addresses the limitation of pre-trained Large Language Models (LLMs) on multi-step logical reasoning problems with a dual-process neuro-symbolic reasoning system called LLM+ASP, where an LLM (e.g., GPT-3) serves as a highly effective few-shot semantic parser that turns natural language sentences into a logical form that can be used as input to ASP. LLM+ASP achieves state-of-the-art performance on several textual reasoning benchmarks and can handle robot planning tasks that an LLM alone fails to solve.
Mission and Motion Planning for Multi-robot Systems in Constrained Environments

As robots become mechanically more capable, they are going to be more and more integrated into our daily lives. Over time, human’s expectation of what the robot capabilities are is getting higher. Therefore, it can be conjectured that often robots will not act as human commanders intended them to do. That is, the users of the robots may have a different point of view from the one the robots do.

The first part of this dissertation covers methods that resolve some instances of this mismatch when the mission requirements are expressed in Linear Temporal Logic (LTL) for handling coverage, sequencing, conditions and avoidance. That is, the following general questions are addressed:

* What cause of the given mission is unrealizable?

* Is there any other feasible mission that is close to the given one?

In order to answer these questions, the LTL Revision Problem is applied and it is formulated as a graph search problem. It is shown that in general the problem is NP-Complete. Hence, it is proved that the heuristic algorihtm has 2-approximation bound in some cases. This problem, then, is extended to two different versions: one is for the weighted transition system and another is for the specification under quantitative preference. Next, a follow up question is addressed:

* How can an LTL specified mission be scaled up to multiple robots operating in confined environments?

The Cooperative Multi-agent Planning Problem is addressed by borrowing a technique from cooperative pathfinding problems in discrete grid environments. Since centralized planning for multi-robot systems is computationally challenging and easily results in state space explosion, a distributed planning approach is provided through agent coupling and de-coupling.

In addition, in order to make such robot missions work in the real world, robots should take actions in the continuous physical world. Hence, in the second part of this thesis, the resulting motion planning problems is addressed for non-holonomic robots.

That is, it is devoted to autonomous vehicles’ motion planning in challenging environments such as rural, semi-structured roads. This planning problem is solved with an on-the-fly hierarchical approach, using a pre-computed lattice planner. It is also proved that the proposed algorithm guarantees resolution-completeness in such demanding environments. Finally, possible extensions are discussed.
Knowledge Representation, Reasoning and Learning for Non-Extractive Reading Comprehension

While in recent years deep learning (DL) based approaches have been the popular approach in developing end-to-end question answering (QA) systems, such systems lack several desired properties, such as the ability to do sophisticated reasoning with knowledge, the ability to learn using less resources and interpretability. In this thesis, I explore solutions that aim to address these drawbacks.

Towards this goal, I work with a specific family of reading comprehension tasks, normally referred to as the Non-Extractive Reading Comprehension (NRC), where the given passage does not contain enough information and to correctly answer sophisticated reasoning and ``additional knowledge" is required. I have organized the NRC tasks into three categories. Here I present my solutions to the first two categories and some preliminary results on the third category.

Category 1 NRC tasks refer to the scenarios where the required ``additional knowledge" is missing but there exists a decent natural language parser. For these tasks, I learn the missing ``additional knowledge" with the help of the parser and a novel inductive logic programming. The learned knowledge is then used to answer new questions. Experiments on three NRC tasks show that this approach along with providing an interpretable solution achieves better or comparable accuracy to that of the state-of-the-art DL based approaches.

The category 2 NRC tasks refer to the alternate scenario where the ``additional knowledge" is available but no natural language parser works well for the sentences of the target domain. To deal with these tasks, I present a novel hybrid reasoning approach which combines symbolic and natural language inference (neural reasoning) and ultimately allows symbolic modules to reason over raw text without requiring any translation. Experiments on two NRC tasks shows its effectiveness.

The category 3 neither provide the ``missing knowledge" and nor a good parser. This thesis does not provide an interpretable solution for this category but some preliminary results and analysis of a pure DL based approach. Nonetheless, the thesis shows beyond the world of pure DL based approaches, there are tools that can offer interpretable solutions for challenging tasks without using much resource and possibly with better accuracy.
Activity Specification for Time-based Discrete Event Simulation Models

Computational models for relatively complex systems are subject to many difficulties, among which is the ability for the models to be discretely understandable and applicable to specific problem types and their solutions. This demands the specification of a dynamic system as a collection of models, including metamodels. In this context, new modeling approaches and tools can help provide a richer understanding and, therefore, the development of sophisticated behavior in system dynamics. From this vantage point, an activity specification is proposed as a modeling approach based on a time-based discrete event system abstraction. Such models are founded upon set-theoretic principles and methods for modeling and simulation with the intent of making them subject to specific and profound questions for user-defined experiments.

Because developing models is becoming more time-consuming and expensive, some research has focused on the acquisition of concrete means targeted at the early stages of component-based system analysis and design. The model-driven architecture (MDA) framework provides some means for the behavioral modeling of discrete systems. The development of models can benefit from simplifications and elaborations enabled by the MDA meta-layers, which is essential for managing model complexity. Although metamodels pose difficulties, especially for developing complex behavior, as opposed to structure, they are advantageous and complementary to formal models and concrete implementations in programming languages.

The developed approach is focused on action and control concepts across the MDA meta-layers and is proposed for the parallel Discrete Event System Specification (P-DEVS) formalism. The Unified Modeling Language (UML) activity meta-models are used with syntax and semantics that conform to the DEVS formalism and its execution protocol. The notions of the DEVS component and state are used together according to their underlying system-theoretic foundation. A prototype tool supporting activity modeling was developed to demonstrate the degree to which action-based behavior can be modeled using the MDA and DEVS. The parallel DEVS, as a formal approach, supports identifying the semantics of the UML activities. Another prototype was developed to create activity models and support their execution with the DEVS-Suite simulator, and a set of prototypical multiprocessor architecture model specifications were designed, simulated, and analyzed.
Towards understanding natural language: semantic parsing, commonsense knowledge acquisition, reasoning framework and applications

Reasoning with commonsense knowledge is an integral component of human behavior. It is due to this capability that people know that a weak person may not be able to lift someone. It has been a long standing goal of the Artificial Intelligence community to simulate such commonsense reasoning abilities in machines. Over the years, many advances have been made and various challenges have been proposed to test their abilities. The Winograd Schema Challenge (WSC) is one such Natural Language Understanding (NLU) task which was also proposed as an alternative to the Turing Test. It is made up of textual question answering problems which require resolution of a pronoun to its correct antecedent.

In this thesis, two approaches of developing NLU systems to solve the Winograd Schema Challenge are demonstrated. To this end, a semantic parser is presented, various kinds of commonsense knowledge are identified, techniques to extract commonsense knowledge are developed and two commonsense reasoning algorithms are presented. The usefulness of the developed tools and techniques is shown by applying them to solve the challenge.
Reasoning and Learning with Probabilistic Answer Set Programming

Knowledge Representation (KR) is one of the prominent approaches to Artificial Intelligence (AI) that is concerned with representing knowledge in a form that computer systems can utilize to solve complex problems. Answer Set Programming (ASP), based on the stable model semantics, is a widely-used KR framework that facilitates elegant and efficient representations for many problem domains that require complex reasoning.

However, while ASP is effective on deterministic problem domains, it is not suitable for applications involving quantitative uncertainty, for example, those that require probabilistic reasoning. Furthermore, it is hard to utilize information that can be statistically induced from data with ASP problem modeling.

This dissertation presents the language LP^MLN, which is a probabilistic extension of the stable model semantics with the concept of weighted rules, inspired by Markov Logic. An LP^MLN program defines a probability distribution over "soft" stable models, which may not satisfy all rules, but the more rules with the bigger weights they satisfy, the bigger their probabilities. LP^MLN takes advantage of both ASP and Markov Logic in a single framework, allowing representation of problems that require both logical and probabilistic reasoning in an intuitive and elaboration tolerant way.

This dissertation establishes formal relations between LP^MLN and several other formalisms, discusses inference and weight learning algorithms under LP^MLN, and presents systems implementing the algorithms. LP^MLN systems can be used to compute other languages translatable into LP^MLN.

The advantage of LP^MLN for probabilistic reasoning is illustrated by a probabilistic extension of the action language BC+, called pBC+, defined as a high-level notation of LP^MLN for describing transition systems. Various probabilistic reasoning about transition systems, especially probabilistic diagnosis, can be modeled in pBC+ and computed using LP^MLN systems. pBC+ is further extended with the notion of utility, through a decision-theoretic extension of LP^MLN, and related with Markov Decision Process (MDP) in terms of policy optimization problems. pBC+ can be used to represent (PO)MDP in a succinct and elaboration tolerant way, which enables planning with (PO)MDP algorithms in action domains whose description requires rich KR constructs, such as recursive definitions and indirect effects of actions.
Explainable Fact Checking by Combining Automated Rule Discovery with Probabilistic Answer Set Programming

The goal of fact checking is to determine if a given claim holds. A promising ap- proach for this task is to exploit reference information in the form of knowledge graphs (KGs), a structured and formal representation of knowledge with semantic descriptions of entities and relations. KGs are successfully used in multiple appli- cations, but the information stored in a KG is inevitably incomplete. In order to address the incompleteness problem, this thesis proposes a new method built on top of recent results in logical rule discovery in KGs called RuDik and a probabilistic extension of answer set programs called LPMLN.

This thesis presents the integration of RuDik which discovers logical rules over a given KG and LPMLN to do probabilistic inference to validate a fact. While automatically discovered rules over a KG are for human selection and revision, they can be turned into LPMLN programs with a minor modification. Leveraging the probabilistic inference in LPMLN, it is possible to (i) derive new information which is not explicitly stored in a KG with a probability associated with it, and (ii) provide supporting facts and rules for interpretable explanations for such decisions.

Also, this thesis presents experiments and results to show that this approach can label claims with high precision. The evaluation of the system also sheds light on the role played by the quality of the given rules and the quality of the KG.
Knowledge and Reasoning for Image Understanding

Image Understanding is a long-established discipline in computer vision, which encompasses a body of advanced image processing techniques, that are used to locate (“where”), characterize and recognize (“what”) objects, regions, and their attributes in the image. However, the notion of “understanding” (and the goal of artificial intelligent machines) goes beyond factual recall of the recognized components and includes reasoning and thinking beyond what can be seen (or perceived). Understanding is often evaluated by asking questions of increasing difficulty. Thus, the expected functionalities of an intelligent Image Understanding system can be expressed in terms of the functionalities that are required to answer questions about an image. Answering questions about images require primarily three components: Image Understanding, question (natural language) understanding, and reasoning based on knowledge. Any question, asking beyond what can be directly seen, requires modeling of commonsense (or background/ontological/factual) knowledge and reasoning.

Knowledge and reasoning have seen scarce use in image understanding applications. In this thesis, we demonstrate the utilities of incorporating background knowledge and using explicit reasoning in image understanding applications. We first present a comprehensive survey of the previous work that utilized background knowledge and reasoning in understanding images. This survey outlines the limited use of commonsense knowledge in high-level applications. We then present a set of vision and reasoning-based methods to solve several applications and show that these approaches benefit in terms of accuracy and interpretability from the explicit use of knowledge and reasoning. We propose novel knowledge representations of image, knowledge acquisition methods, and a new implementation of an efficient probabilistic logical reasoning engine that can utilize publicly available commonsense knowledge to solve applications such as visual question answering, image puzzles. Additionally, we identify the need for new datasets that explicitly require external commonsense knowledge to solve. We propose the new task of Image Riddles, which requires a combination of vision, and reasoning based on ontological knowledge; and we collect a sufficiently large dataset to serve as an ideal testbed for vision and reasoning research. Lastly, we propose end-to-end deep architectures that can combine vision, knowledge and reasoning modules together and achieve large performance boosts over state-of-the-art methods.
Representing and Reasoning about Dynamic Multi-Agent Domains: An Action Language Approach

Reasoning about actions forms the basis of many tasks such as prediction, planning, and diagnosis in a dynamic domain. Within the reasoning about actions community, a broad class of languages, called action languages, has been developed together with a methodology for their use in representing and reasoning about dynamic domains. With a few notable exceptions, the focus of these efforts has largely centered around single-agent systems. Agents rarely operate in a vacuum however, and almost in parallel, substantial work has been done within the dynamic epistemic logic community towards understanding how the actions of an agent may effect not just his own knowledge and/or beliefs, but those of his fellow agents as well. What is less understood by both communities is how to represent and reason about both the direct and indirect effects of both ontic and epistemic actions within a multi-agent setting. This dissertation presents ongoing research towards a framework for representing and reasoning about dynamic multi-agent domains involving both classes of actions.

The contributions of this work are as follows: the formulation of a precise mathematical model of a dynamic multi-agent domain based on the notion of a transition diagram; the development of the multi-agent action languages mA+ and mAL based upon this model, as well as preliminary investigations of their properties and implementations via logic programming under the answer set semantics; precise formulations of the temporal projection, and planning problems within a multi-agent context; and an investigation of the application of the proposed approach to the representation of, and reasoning about, scenarios involving the modalities of knowledge and belief.
Forensic Methods and Tools for Web Environments

The Web is one of the most exciting and dynamic areas of development in today’s technology. However, with such activity, innovation, and ubiquity have come a set of new challenges for digital forensic examiners, making their jobs even more difficult. For examiners to become as effective with evidence from the Web as they currently are with more traditional evidence, they need (1) methods that guide them to know how to approach this new type of evidence and (2) tools that accommodate web environments’ unique characteristics.

In this dissertation, I present my research to alleviate the difficulties forensic examiners currently face with respect to evidence originating from web environments. First, I introduce a framework for web environment forensics, which elaborates on and addresses the key challenges examiners face and outlines a method for how to approach web-based evidence. Next, I describe my work to identify extensions installed on encrypted web thin clients using only a sound understanding of these systems’ inner workings and the metadata of the encrypted files. Finally, I discuss my approach to reconstructing the timeline of events on encrypted web thin clients by using service provider APIs as a proxy for directly analyzing the device. In each of these research areas, I also introduce structured formats that I customized to accommodate the unique features of the evidence sources while also facilitating tool interoperability and information sharing.
