Toward Inductive Reverse Engineering of Web Applications

134439-Thumbnail Image.png
Description
In the area of hardware, reverse engineering was traditionally focused on developing clones—duplicated components that performed the same functionality of the original component. While reverse engineering techniques have been applied to software, these techniques have instead focused on understanding high-level

In the area of hardware, reverse engineering was traditionally focused on developing clones—duplicated components that performed the same functionality of the original component. While reverse engineering techniques have been applied to software, these techniques have instead focused on understanding high-level software designs to ease the software maintenance burden. This approach works well for traditional applications that contain source code, however, there are circumstances, particularly regarding web applications, where it would be very beneficial to clone a web application and no source code is present, e.g., for security testing of the application or for offline mock testing of a third-party web service. We call this the web application cloning problem.
This thesis presents a possible solution to the problem of web application cloning. Our approach is a novel application of inductive programming, which we call inductive reverse engineering. The goal of inductive reverse engineering is to automatically reverse engineer an abstraction of the web application’s code in a completely black-box manner. We build this approach using recent advances in inductive programming, and we solve several technical challenges to scale the inductive programming techniques to realistic-sized web applications. We target the initial version of our inductive reverse engineering tool to a subset of web applications, i.e., those that do not store state and those that do not have loops. We introduce an evaluation methodology for web application cloning techniques and evaluate our approach on several real-world web applications. The results indicate that inductive reverse engineering can effectively reverse engineer specific types of web applications. In the future, we hope to extend the power of inductive reverse engineering to web applications with state and to learn loops, while still maintaining tractability.
Date Created
2017-05
Agent

On the Application of Malware Clustering for Threat Intelligence Synthesis

134346-Thumbnail Image.png
Description
Malware forensics is a time-consuming process that involves a significant amount of data collection. To ease the load on security analysts, many attempts have been made to automate the intelligence gathering process and provide a centralized search interface. Certain of

Malware forensics is a time-consuming process that involves a significant amount of data collection. To ease the load on security analysts, many attempts have been made to automate the intelligence gathering process and provide a centralized search interface. Certain of these solutions map existing relations between threats and can discover new intelligence by identifying correlations in the data. However, such systems generally treat each unique malware sample as its own distinct threat. This fails to model the real malware landscape, in which so many ``new" samples are actually variants of samples that have already been discovered. Were there some way to reliably determine whether two malware samples belong to the same family, intelligence for one sample could be applied to any sample in the family, greatly reducing the complexity of intelligence synthesis. Clustering is a common big data approach for grouping data samples which have common features, and has been applied in several recent papers for identifying related malware. It therefore has the potential to be used as described to simplify the intelligence synthesis process. However, existing threat intelligence systems do not use malware clustering. In this paper, we attempt to design a highly accurate malware clustering system, with the ultimate goal of integrating it into a threat intelligence platform. Toward this end, we explore the many considerations of designing such a system: how to extract features to compare malware, and how to use these features for accurate clustering. We then create an experimental clustering system, and evaluate its effectiveness using two different clustering algorithms.
Date Created
2017-05
Agent

Filesystem I/O Tracing and Replaying

134328-Thumbnail Image.png
Description
As mobile devices have risen to prominence over the last decade, their importance has been increasingly recognized. Workloads for mobile devices are often very different from those on desktop and server computers, and solutions that worked in the past are

As mobile devices have risen to prominence over the last decade, their importance has been increasingly recognized. Workloads for mobile devices are often very different from those on desktop and server computers, and solutions that worked in the past are not always the best fit for the resource- and energy-constrained computing that characterizes mobile devices. While this is most commonly seen in CPU and graphics workloads, this device class difference extends to I/O as well. However, while a few tools exist to help analyze mobile storage solutions, there exists a gap in the available software that prevents quality analysis of certain research initiatives, such as I/O deduplication on mobile devices. This honors thesis will demonstrate a new tool that is capable of capturing I/O on the filesystem layer of mobile devices running the Android operating system, in support of new mobile storage research. Uniquely, it is able to capture both metadata of writes as well as the actual written data, transparently to the apps running on the devices. Based on a modification of the strace program, fstrace and its companion tool fstrace-replay can record and replay filesystem I/O of actual Android apps. Using this new tracing tool, several traces from popular Android apps such as Facebook and Twitter were collected and analyzed.
Date Created
2017-05
Agent

TSCAN: Toward a Static and Customizable Analysis for Node.js

134266-Thumbnail Image.png
Description
Node.js is an extremely popular development framework for web applications. The appeal of its event-driven, asynchronous flow and the convenience of JavaScript as its programming language have driven its rapid growth, and it is currently deployed by leading companies in

Node.js is an extremely popular development framework for web applications. The appeal of its event-driven, asynchronous flow and the convenience of JavaScript as its programming language have driven its rapid growth, and it is currently deployed by leading companies in retail, finance, and other important sectors. However, the tools currently available for Node.js developers to secure their applications against malicious attackers are notably scarce. While there has been a substantial amount of security tools created for web applications in many other languages such as PHP and Java, very little exists for Node.js applications. This could compromise private information belonging to companies such as PayPal and WalMart. We propose a tool to statically analyze Node.js web applications for five popular vulnerabilites: cross-site scripting, SQL injection, server-side request forgery, command injection, and code injection. We base our tool off of JSAI, a platform created to parse client-side JavaScript for security risks. JSAI is novel because of its configuration capabilities, which allow a user to choose between various analysis options at runtime in order to select the most thorough analysis with the least amount of processing time. We contribute to the development of our tool by rigorously analyzing and documenting vulnerable functions and objects in Node.js that are relevant to the vulnerabilities we have selected. We intend to use this documentation to build a robust Node.js static analysis tool and we hope that other developers will also incorporate this analysis into their Node.js security projects.
Date Created
2017-05
Agent

Moving Target Defense Using Live Migration of Docker Containers

155819-Thumbnail Image.png
Description
Today the information technology systems have addresses, software stacks and other configuration remaining unchanged for a long period of time. This paves way for malicious attacks in the system from unknown vulnerabilities. The attacker can take advantage of this situation

Today the information technology systems have addresses, software stacks and other configuration remaining unchanged for a long period of time. This paves way for malicious attacks in the system from unknown vulnerabilities. The attacker can take advantage of this situation and plan their attacks with sufficient time. To protect our system from this threat, Moving Target Defense is required where the attack surface is dynamically changed, making it difficult to strike.

In this thesis, I incorporate live migration of Docker container using CRIU (checkpoint restore) for moving target defense. There are 460K Dockerized applications, a 3100% growth over 2 years[1]. Over 4 billion containers have been pulled so far from Docker hub. Docker is supported by a large and fast growing community of contributors and users. As an example, there are 125K Docker Meetup members worldwide. As we see industry adapting to Docker rapidly, a moving target defense solution involving containers is beneficial for being robust and fast. A proof of concept implementation is included for studying performance attributes of Docker migration.

The detection of attack is using a scenario involving definitions of normal events on servers. By defining system activities, and extracting syslog in centralized server, attack can be detected via extracting abnormal activates and this detection can be a trigger for the Docker migration.
Date Created
2017
Agent

Security Analysis of HTTP/2 Protocol

155760-Thumbnail Image.png
Description
The Internet traffic, today, comprises majorly of Hyper Text Transfer Protocol (HTTP). The first version of HTTP protocol was standardized in 1991, followed by a major upgrade in May 2015. HTTP/2 is the next generation of HTTP protocol that promises

The Internet traffic, today, comprises majorly of Hyper Text Transfer Protocol (HTTP). The first version of HTTP protocol was standardized in 1991, followed by a major upgrade in May 2015. HTTP/2 is the next generation of HTTP protocol that promises to resolve short-comings of HTTP 1.1 and provide features to greatly improve upon its performance.

There has been a 1000\% increase in the cyber crimes rate over the past two years. Since HTTP/2 is a relatively new protocol with a very high acceptance rate (around 68\% of all HTTPS traffic), it gives rise to an urgent need of analyzing this protocol from a security vulnerability perspective.

In this thesis, I have systematically analyzed the security concerns in HTTP/2 protocol - starting from the specifications, testing all variation of frames (basic entity in HTTP/2 protocol) and every new introduced feature.

In this thesis, I also propose the Context Aware fuzz Testing for Binary communication protocols methodology. Using this testing methodology, I was able to discover a serious security susceptibility using which an attacker can carry out a denial-of-service attack on Apache
Date Created
2017
Agent

Categorization of Phishing Detection Features And Using the Feature Vectors to Classify Phishing Websites

155726-Thumbnail Image.png
Description
Phishing is a form of online fraud where a spoofed website tries to gain access to user's sensitive information by tricking the user into believing that it is a benign website. There are several solutions to detect phishing attacks such

Phishing is a form of online fraud where a spoofed website tries to gain access to user's sensitive information by tricking the user into believing that it is a benign website. There are several solutions to detect phishing attacks such as educating users, using blacklists or extracting phishing characteristics found to exist in phishing attacks. In this thesis, we analyze approaches that extract features from phishing websites and train classification models with extracted feature set to classify phishing websites. We create an exhaustive list of all features used in these approaches and categorize them into 6 broader categories and 33 finer categories. We extract 59 features from the URL, URL redirects, hosting domain (WHOIS and DNS records) and popularity of the website and analyze their robustness in classifying a phishing website. Our emphasis is on determining the predictive performance of robust features. We evaluate the classification accuracy when using the entire feature set and when URL features or site popularity features are excluded from the feature set and show how our approach can be used to effectively predict specific types of phishing attacks such as shortened URLs and randomized URLs. Using both decision table classifiers and neural network classifiers, our results indicate that robust features seem to have enough predictive power to be used in practice.
Date Created
2017
Agent

CSM Automated Confidence Score Measurement of Threat Indicators

155706-Thumbnail Image.png
Description
The volume and frequency of cyber attacks have exploded in recent years. Organizations subscribe to multiple threat intelligence feeds to increase their knowledge base and better equip their security teams with the latest information in threat intelligence domain. Though such

The volume and frequency of cyber attacks have exploded in recent years. Organizations subscribe to multiple threat intelligence feeds to increase their knowledge base and better equip their security teams with the latest information in threat intelligence domain. Though such subscriptions add intelligence and can help in taking more informed decisions, organizations have to put considerable efforts in facilitating and analyzing a large number of threat indicators. This problem worsens further, due to a large number of false positives and irrelevant events detected as threat indicators by existing threat feed sources. It is often neither practical nor cost-effective to analyze every single alert considering the staggering volume of indicators. The very reason motivates to solve the overcrowded threat indicators problem by prioritizing and filtering them.

To overcome above issue, I explain the necessity of determining how likely a reported indicator is malicious given the evidence and prioritizing it based on such determination. Confidence Score Measurement system (CSM) introduces the concept of confidence score, where it assigns a score of being malicious to a threat indicator based on the evaluation of different threat intelligence systems. An indicator propagates maliciousness to adjacent indicators based on relationship determined from behavior of an indicator. The propagation algorithm derives final confidence to determine overall maliciousness of the threat indicator. CSM can prioritize the indicators based on confidence score; however, an analyst may not be interested in the entire result set, so CSM narrows down the results based on the analyst-driven input. To this end, CSM introduces the concept of relevance score, where it combines the confidence score with analyst-driven search by applying full-text search techniques. It prioritizes the results based on relevance score to provide meaningful results to the analyst. The analysis shows the propagation algorithm of CSM linearly scales with larger datasets and achieves 92% accuracy in determining threat indicators. The evaluation of the result demonstrates the effectiveness and practicality of the approach.
Date Created
2017
Agent

Next Generation Black-Box Web Application Vulnerability Analysis Framework

155601-Thumbnail Image.png
Description
Web applications are an incredibly important aspect of our modern lives. Organizations

and developers use automated vulnerability analysis tools, also known as

scanners, to automatically find vulnerabilities in their web applications during development.

Scanners have traditionally fallen into two types of approaches: black-box

and

Web applications are an incredibly important aspect of our modern lives. Organizations

and developers use automated vulnerability analysis tools, also known as

scanners, to automatically find vulnerabilities in their web applications during development.

Scanners have traditionally fallen into two types of approaches: black-box

and white-box. In the black-box approaches, the scanner does not have access to the

source code of the web application whereas a white-box approach has access to the

source code. Today’s state-of-the-art black-box vulnerability scanners employ various

methods to fuzz and detect vulnerabilities in a web application. However, these

scanners attempt to fuzz the web application with a number of known payloads and

to try to trigger a vulnerability. This technique is simple but does not understand

the web application that it is testing. This thesis, presents a new approach to vulnerability

analysis. The vulnerability analysis module presented uses a novel approach

of Inductive Reverse Engineering (IRE) to understand and model the web application.

IRE first attempts to understand the behavior of the web application by giving

certain number of input/output pairs to the web application. Then, the IRE module

hypothesizes a set of programs (in a limited language specific to web applications,

called AWL) that satisfy the input/output pairs. These hypotheses takes the form of

a directed acyclic graph (DAG). AWL vulnerability analysis module can then attempt

to detect vulnerabilities in this DAG. Further, it generates the payload based on the

DAG, and therefore this payload will be a precise payload to trigger the potential vulnerability

(based on our understanding of the program). It then tests this potential

vulnerability using the generated payload on the actual web application, and creates

a verification procedure to see if the potential vulnerability is actually vulnerable,

based on the web application’s response.
Date Created
2017
Agent

Shadow Phone and Ghost SIM: A Step Toward Geolocation Anonymous Calling

155591-Thumbnail Image.png
Description
Mobile telephony is a critical aspect of our modern society: through telephone calls,

it is possible to reach almost anyone around the globe. However, every mobile telephone

call placed implicitly leaks the user's location to the telephony service provider (TSP).

This privacy leakage

Mobile telephony is a critical aspect of our modern society: through telephone calls,

it is possible to reach almost anyone around the globe. However, every mobile telephone

call placed implicitly leaks the user's location to the telephony service provider (TSP).

This privacy leakage is due to the fundamental nature of mobile telephony calls that

must connect to a local base station to receive service and place calls. Thus, the TSP

can track the physical location of the user for every call that they place. While the

The Internet is similar in this regard, privacy-preserving technologies such as Tor allow

users to connect to websites anonymously (without revealing to their ISP the site

that they are visiting). In this thesis, the scheme presented, called shadow calling,

to allow geolocation anonymous calling from legacy mobile devices. In this way,

the call is placed from the same number, however, the TSP will not know the user's

physical location. The scheme does not require any change on the network side and

can be used on current mobile networks. The scheme implemented is for the GSM

(commonly referred to as 2G) network, as it is the most widely used mode of mobile

telephony communication. The feasibility of our scheme is demonstrated through the

prototype. Shadow calling, which renders the users geolocation anonymous, will be

beneficial for users such as journalists, human rights activists in hostile nations, or

other privacy-demanding users.
Date Created
2017
Agent