Security Analysis of x86 Processor Microcode

137309-Thumbnail Image.png
Description
Modern computer processors contain an embedded firmware known as microcode that controls decode and execution of x86 instructions. Although proprietary and relatively obscure, this microcode can be modified using updates released by hardware manufacturers to correct processor logic flaws (errata).

Modern computer processors contain an embedded firmware known as microcode that controls decode and execution of x86 instructions. Although proprietary and relatively obscure, this microcode can be modified using updates released by hardware manufacturers to correct processor logic flaws (errata). At the same time, a malicious microcode update could compromise a processor by implementing new malicious instructions or altering the functionality of existing instructions, including processor-accelerated virtualization or cryptographic primitives. Not only is this attack vector capable of subverting all software-enforced security policies and access controls, but it also leaves behind no postmortem forensic evidence since the write-only patch memory is cleared upon system reset. Although supervisor privileges (ring zero) are required to update processor microcode, this attack cannot be easily mitigated due to the implementation of microcode update functionality within processor silicon. In this paper, we reveal the microarchitecture and mechanism of microcode updates, present a security analysis of this attack vector, and provide some mitigation suggestions.
Date Created
2014-05
Agent

Computing a Probabilistic Extension of Answer Set Program Language Using ASP and Markov Logic Solvers

155822-Thumbnail Image.png
Description
LPMLN is a recent probabilistic logic programming language which combines both Answer Set Programming (ASP) and Markov Logic. It is a proper extension of Answer Set programs which allows for reasoning about uncertainty using weighted rules under the stable model

LPMLN is a recent probabilistic logic programming language which combines both Answer Set Programming (ASP) and Markov Logic. It is a proper extension of Answer Set programs which allows for reasoning about uncertainty using weighted rules under the stable model semantics with a weight scheme that is adopted from Markov Logic. LPMLN has been shown to be related to several formalisms from the knowledge representation (KR) side such as ASP and P-Log, and the statistical relational learning (SRL) side such as Markov Logic Networks (MLN), Problog and Pearl’s causal models (PCM). Formalisms like ASP, P-Log, Problog, MLN, PCM have all been shown to embeddable in LPMLN which demonstrates the expressivity of the language. Interestingly, LPMLN has also been shown to reducible to ASP and MLN which is not only theoretically interesting, but also practically important from a computational point of view in that the reductions yield ways to compute LPMLN programs utilizing ASP and MLN solvers. Additionally, the reductions also allow the users to compute other formalisms which can be reduced to LPMLN.

This thesis realizes two implementations of LPMLN based on the reductions from LPMLN to ASP and LPMLN to MLN. This thesis first presents an implementation of LPMLN called LPMLN2ASP that uses standard ASP solvers for computing MAP inference using weak constraints, and marginal and conditional probabilities using stable models enumeration. Next, in this thesis, another implementation of LPMLN called LPMLN2MLN is presented that uses MLN solvers which apply completion to compute the tight fragment of LPMLN programs for MAP inference, marginal and conditional probabilities. The computation using ASP solvers yields exact inference as opposed to approximate inference using MLN solvers. Using these implementations, the usefulness of LPMLN for computing other formalisms is demonstrated by reducing them to LPMLN. The thesis also shows how the implementations are better than the native solvers of some of these formalisms on certain domains. The implementations make use of the current state of the art solving technologies in ASP and MLN, and therefore they benefit from any theoretical and practical advances in these technologies, thereby also benefiting the computation of other formalisms that can be reduced to LPMLN. Furthermore, the implementation also allows for certain SRL formalisms to be computed by ASP solvers, and certain KR formalisms to be computed by MLN solvers.
Date Created
2017
Agent

Representing Hybrid Transition Systems in an Action Language Modulo ODEs

155536-Thumbnail Image.png
Description
Several physical systems exist in the real world that involve continuous as well as discrete changes. These range from natural dynamic systems like the system of a bouncing ball to robotic dynamic systems such as planning the motion of a

Several physical systems exist in the real world that involve continuous as well as discrete changes. These range from natural dynamic systems like the system of a bouncing ball to robotic dynamic systems such as planning the motion of a robot across obstacles. The key aspects of effectively describing such dynamic systems is to be able to plan and verify the evolution of the continuous components of the system while simultaneously maintaining critical constraints. Developing a framework that can effectively represent and find solutions to such physical systems prove to be highly advantageous. Both hybrid automata and action languages are formal models for describing the evolution of dynamic systems. The action language C+ is a rich and expressive language framework to formalize physical systems, but can be used only with physical systems in the discrete domain and is limited in its support of continuous domain components of such systems. Hybrid Automata is a well established formalism used to represent such complex physical systems at a theoretical level, however it is not expressive enough to capture the complex relations between the components of the system the way C+ does.

This thesis will focus on establishing a formal relationship between these two formalisms by showing how to succinctly represent Hybrid Automata in an action language which in turn is defined as a high-level notation for answer set programming modulo theories (ASPMT) --- an extension of answer set programs in the first-order level. Furthermore, this encoding framework is shown to be more effective and expressive than Hybrid Automata by highlighting its ability in allowing states of a hybrid transition system to be defined by complex relations among components that would otherwise be abstracted away in Hybrid Automata. The framework is further realized in the implementation of the system CPLUS2ASPMT, which takes advantage of state of the art ODE(Ordinary Differential Equations) based SMT solver dReal to provide support for ODE based evolution of continuous components of a dynamic system.
Date Created
2017
Agent

On the Relationships Among Probabilistic Extensions of Answer Set Semantics

155419-Thumbnail Image.png
Description
Answer Set Programming (ASP) is one of the main formalisms in Knowledge Representation (KR) that is being widely applied in a large number of applications. While ASP is effective on Boolean decision problems, it has difficulty in expressing quantitative uncertainty

Answer Set Programming (ASP) is one of the main formalisms in Knowledge Representation (KR) that is being widely applied in a large number of applications. While ASP is effective on Boolean decision problems, it has difficulty in expressing quantitative uncertainty and probability in a natural way.

Logic Programs under the answer set semantics and Markov Logic Network (LPMLN) is a recent extension of answer set programs to overcome the limitation of the deterministic nature of ASP by adopting the log-linear weight scheme of Markov Logic. This thesis investigates the relationships between LPMLN and two other extensions of ASP: weak constraints to express a quantitative preference among answer sets, and P-log to incorporate probabilistic uncertainty. The studied relationships show how different extensions of answer set programs are related to each other, and how they are related to formalisms in Statistical Relational Learning, such as Problog and MLN, which have shown to be closely related to LPMLN. The studied relationships compare the properties of the involved languages and provide ways to compute one language using an implementation of another language.

This thesis first presents a translation of LPMLN into programs with weak constraints. The translation allows for computing the most probable stable models (i.e., MAP estimates) or probability distribution in LPMLN programs using standard ASP solvers so that the well-developed techniques in ASP can be utilized. This result can be extended to other formalisms, such as Markov Logic, ProbLog, and Pearl’s Causal Models, that are shown to be translatable into LPMLN.

This thesis also presents a translation of P-log into LPMLN. The translation tells how probabilistic nonmonotonicity (the ability of the reasoner to change his probabilistic model as a result of new information) of P-log can be represented in LPMLN, which yields a way to compute P-log using standard ASP solvers or MLN solvers.
Date Created
2017
Agent

Big data analysis of bacterial inhibitors in parallelized cellomics: a machine learning approach

155191-Thumbnail Image.png
Description
Identifying chemical compounds that inhibit bacterial infection has recently gained a considerable amount of attention given the increased number of highly resistant bacteria and the serious health threat it poses around the world. With the development of automated microscopy and

Identifying chemical compounds that inhibit bacterial infection has recently gained a considerable amount of attention given the increased number of highly resistant bacteria and the serious health threat it poses around the world. With the development of automated microscopy and image analysis systems, the process of identifying novel therapeutic drugs can generate an immense amount of data - easily reaching terabytes worth of information. Despite increasing the vast amount of data that is currently generated, traditional analytical methods have not increased the overall success rate of identifying active chemical compounds that eventually become novel therapeutic drugs. Moreover, multispectral imaging has become ubiquitous in drug discovery due to its ability to provide valuable information on cellular and sub-cellular processes using florescent reagents. These reagents are often costly and toxic to cells over an extended period of time causing limitations in experimental design. Thus, there is a significant need to develop a more efficient process of identifying active chemical compounds.

This dissertation introduces novel machine learning methods based on parallelized cellomics to analyze interactions between cells, bacteria, and chemical compounds while reducing the use of fluorescent reagents. Machine learning analysis using image-based high-content screening (HCS) data is compartmentalized into three primary components: (1) \textit{Image Analytics}, (2) \textit{Phenotypic Analytics}, and (3) \textit{Compound Analytics}. A novel software analytics tool called the Insights project is also introduced. The Insights project fully incorporates distributed processing, high performance computing, and database management that can rapidly and effectively utilize and store massive amounts of data generated using HCS biological assessments (bioassays). It is ideally suited for parallelized cellomics in high dimensional space.

Results demonstrate that a parallelized cellomics approach increases the quality of a bioassay while vastly decreasing the need for control data. The reduction in control data leads to less fluorescent reagent consumption. Furthermore, a novel proposed method that uses single-cell data points is proven to identify known active chemical compounds with a high degree of accuracy, despite traditional quality control measurements indicating the bioassay to be of poor quality. This, ultimately, decreases the time and resources needed in optimizing bioassays while still accurately identifying active compounds.
Date Created
2016
Agent

Answer set programming modulo theories

154648-Thumbnail Image.png
Description
Knowledge representation and reasoning is a prominent subject of study within the field of artificial intelligence that is concerned with the symbolic representation of knowledge in such a way to facilitate automated reasoning about this knowledge. Often in real-world domains,

Knowledge representation and reasoning is a prominent subject of study within the field of artificial intelligence that is concerned with the symbolic representation of knowledge in such a way to facilitate automated reasoning about this knowledge. Often in real-world domains, it is necessary to perform defeasible reasoning when representing default behaviors of systems. Answer Set Programming is a widely-used knowledge representation framework that is well-suited for such reasoning tasks and has been successfully applied to practical domains due to efficient computation through grounding--a process that replaces variables with variable-free terms--and propositional solvers similar to SAT solvers. However, some domains provide a challenge for grounding-based methods such as domains requiring reasoning about continuous time or resources.

To address these domains, there have been several proposals to achieve efficiency through loose integrations with efficient declarative solvers such as constraint solvers or satisfiability modulo theories solvers. While these approaches successfully avoid substantial grounding, due to the loose integration, they are not suitable for performing defeasible reasoning on functions. As a result, this expressive reasoning on functions must either be performed using predicates to simulate the functions or in a way that is not elaboration tolerant. Neither compromise is reasonable; the former suffers from the grounding bottleneck when domains are large as is often the case in real-world domains while the latter necessitates encodings to be non-trivially modified for elaborations.

This dissertation presents a novel framework called Answer Set Programming Modulo Theories (ASPMT) that is a tight integration of the stable model semantics and satisfiability modulo theories. This framework both supports defeasible reasoning about functions and alleviates the grounding bottleneck. Combining the strengths of Answer Set Programming and satisfiability modulo theories enables efficient continuous reasoning while still supporting rich reasoning features such as reasoning about defaults and reasoning in domains with incomplete knowledge. This framework is realized in two prototype implementations called MVSM and ASPMT2SMT, and the latter was recently incorporated into a non-monotonic spatial reasoning system. To define the semantics of this framework, we extend the first-order stable model semantics by Ferraris, Lee and Lifschitz to allow "intensional functions" and provide analyses of the theoretical properties of this new formalism and on the relationships between this and existing approaches.
Date Created
2016
Agent

Answering deep queries specified in natural language with respect to a frame based knowledge base and developing related natural language understanding components

154047-Thumbnail Image.png
Description
Question Answering has been under active research for decades, but it has recently taken the spotlight following IBM Watson's success in Jeopardy! and digital assistants such as Apple's Siri, Google Now, and Microsoft Cortana through every smart-phone and browser. However,

Question Answering has been under active research for decades, but it has recently taken the spotlight following IBM Watson's success in Jeopardy! and digital assistants such as Apple's Siri, Google Now, and Microsoft Cortana through every smart-phone and browser. However, most of the research in Question Answering aims at factual questions rather than deep ones such as ``How'' and ``Why'' questions.

In this dissertation, I suggest a different approach in tackling this problem. We believe that the answers of deep questions need to be formally defined before found.

Because these answers must be defined based on something, it is better to be more structural in natural language text; I define Knowledge Description Graphs (KDGs), a graphical structure containing information about events, entities, and classes. We then propose formulations and algorithms to construct KDGs from a frame-based knowledge base, define the answers of various ``How'' and ``Why'' questions with respect to KDGs, and suggest how to obtain the answers from KDGs using Answer Set Programming. Moreover, I discuss how to derive missing information in constructing KDGs when the knowledge base is under-specified and how to answer many factual question types with respect to the knowledge base.

After having the answers of various questions with respect to a knowledge base, I extend our research to use natural language text in specifying deep questions and knowledge base, generate natural language text from those specification. Toward these goals, I developed NL2KR, a system which helps in translating natural language to formal language. I show NL2KR's use in translating ``How'' and ``Why'' questions, and generating simple natural language sentences from natural language KDG specification. Finally, I discuss applications of the components I developed in Natural Language Understanding.
Date Created
2015
Agent

SmartGateway Framework

154004-Thumbnail Image.png
Description
Cisco estimates that by 2020, 50 billion devices will be connected to the Internet. But 99% of the things today remain isolated and unconnected. Different connectivity protocols, proprietary access, varied device characteristics, security concerns are the main reasons for that

Cisco estimates that by 2020, 50 billion devices will be connected to the Internet. But 99% of the things today remain isolated and unconnected. Different connectivity protocols, proprietary access, varied device characteristics, security concerns are the main reasons for that isolated state. This project aims at designing and building a prototype gateway that exposes a simple and intuitive HTTP Restful interface to access and manipulate devices and the data that they produce while addressing most of the issues listed above. Along with manipulating devices, the framework exposes sensor data in such a way that it can be used to create applications like rules or events that make the home smarter. It also allows the user to represent high-level knowledge by aggregating the low-level sensor data. This high-level representation can be considered as a property of the environment or object rather than the sensor itself which makes interpreting the values more intuitive and accessible.
Date Created
2015
Agent

Dynamic analysis of embedded software

154003-Thumbnail Image.png
Description
Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded

Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation overhead for gathering execution information may change the execution of a program, and lead to distorted analysis results, i.e., probe effect. This thesis presents a framework that tackles the non-determinism and probe effect incurred in dynamic analysis of embedded software. The thesis largely consists of three parts. First of all, we discusses a deterministic replay framework to provide reproducible execution. Once a program execution is recorded, software instrumentation can be safely applied during replay without probe effect. Second, a discussion of probe effect is presented and a simulation-based analysis is proposed to detect execution changes of a program caused by instrumentation overhead. The simulation-based analysis examines if the recording instrumentation changes the original program execution. Lastly, the thesis discusses data race detection algorithms that help to remove data races for correctness of the replay and the simulation-based analysis. The focus is to make the detection efficient for C/C++ programs, and to increase scalability of the detection on multi-core machines.
Date Created
2015
Agent

Solving Winograd Schema Challenge: using semantic parsing, automatic knowledge acquisition and logical reasoning

Description
Turing test has been a benchmark scale for measuring the human level intelligence in computers since it was proposed by Alan Turing in 1950. However, for last 60 years, the applications such as ELIZA, PARRY, Cleverbot and Eugene Goostman, that

Turing test has been a benchmark scale for measuring the human level intelligence in computers since it was proposed by Alan Turing in 1950. However, for last 60 years, the applications such as ELIZA, PARRY, Cleverbot and Eugene Goostman, that claimed to pass the test. These applications are either based on tricks to fool humans on a textual chat based test or there has been a disagreement between AI communities on them passing the test. This has led to the school of thought that it might not be the ideal test for predicting the human level intelligence in machines.

Consequently, the Winograd Schema Challenge has been suggested as an alternative to the Turing test. As opposed to deciding the intelligent behavior with the help of chat servers, like it was done in the Turing test, the Winograd Schema Challenge is a question answering test. It consists of sentence and question pairs such that the answer to the question depends on the resolution of a definite pronoun or adjective in the sentence. The answers are fairly intuitive for humans but they are difficult for machines because it requires some sort of background or commonsense knowledge about the sentence.

In this thesis, I propose a novel technique to solve the Winograd Schema Challenge. The technique has three basic modules at its disposal, namely, a Semantic Parser that parses the English text (both sentences and questions) into a formal representation, an Automatic Background Knowledge Extractor that extracts the Background Knowledge pertaining to the given Winograd sentence, and an Answer Set Programming Reasoning Engine that reasons on the given Winograd sentence and the corresponding Background Knowledge. The applicability of the technique is illustrated by solving a subset of Winograd Schema Challenge pertaining to a certain type of Background Knowledge. The technique is evaluated on the subset and a notable accuracy is achieved.
Date Created
2014
Agent