Enhancing Binary Analysis through Cognitive Load Theory

171701-Thumbnail Image.png
Description
Reverse engineering is a process focused on gaining an understanding for the intricaciesof a system. This practice is critical in cybersecurity as it promotes the findings and patching of vulnerabilities as well as the counteracting of malware. Disassemblers and decompilers have become

Reverse engineering is a process focused on gaining an understanding for the intricaciesof a system. This practice is critical in cybersecurity as it promotes the findings and patching of vulnerabilities as well as the counteracting of malware. Disassemblers and decompilers have become essential when reverse engineering due to the readability of information they transcribe from binary files. However, these tools still tend to produce involved and complicated outputs that hinder the acquisition of knowledge during binary analysis. Cognitive Load Theory (CLT) explains that this hindrance is due to the human brain’s inability to process superfluous amounts of data. CLT classifies this data into three cognitive load types — intrinsic, extraneous, and germane — that each can help gauge complex procedures. In this research paper, a novel program call graph is presented accounting for these CLT principles. The goal of this graphical view is to reduce the cognitive load tied to the depiction of binary information and to enhance the overall binary analysis process. This feature was implemented within the binary analysis tool, angr and it’s user interface counterpart, angr-management. Additionally, this paper will examine a conducted user study to quantitatively and qualitatively evaluate the effectiveness of the newly proposed proximity view (PV). The user study includes a binary challenge solving portion measured by defined metrics and a survey phase to receive direct participant feedback regarding the view. The results from this study show statistically significant evidence that PV aids in challenge solving and improves the overall understanding binaries. The results also signify that this improvement comes with the cost of time. The survey section of the user study further indicates that users find PV beneficial to the reverse engineering process, but additional information needs to be included in future developments.
Date Created
2022
Agent

A Preliminary Approach for Rewriting AVX Instructions for Binary Decompilation

166285-Thumbnail Image.png
Description

A proposed solution for the decompilation of binaries that include Intel Advanced Vector Extension instruction sets is presented, along with an explanation of the methodology and an overview of the difficulties encountered with the current decompilation process. A simple approach

A proposed solution for the decompilation of binaries that include Intel Advanced Vector Extension instruction sets is presented, along with an explanation of the methodology and an overview of the difficulties encountered with the current decompilation process. A simple approach was made to convert vector operations into scalar operations reflected in new assembly code. This new code overwrites instructions using AVX registers so that all available decompilation software is able to properly decompile binaries using these registers. The results show that this approach is functional and successful at resolving the decompilation problem. However, there may be a way to optimize the performance of the output. In conclusion, our theoretical work can easily be extended and applied to a wider range of instructions and instruction sets to further resolve related decompilation issues with binaries utilizing external instructions.

Date Created
2022-05
Agent

An Exploratory Literature Review of Efforts Towards Improving Cybersecurity

166188-Thumbnail Image.png
Description
Data breaches and software vulnerabilities are increasingly severe problems that incur both monetary and reputational costs for companies as well as societal impacts. While companies have clear monetary and legal incentives to mitigate risk of data breaches, companies have significantly

Data breaches and software vulnerabilities are increasingly severe problems that incur both monetary and reputational costs for companies as well as societal impacts. While companies have clear monetary and legal incentives to mitigate risk of data breaches, companies have significantly less incentive to mitigate software product vulnerabilities, and their existing incentive is widely considered insufficient. In this thesis, I initially set out to perform a statistical analysis correlating company characteristics and behavior with the characteristics of the data breaches they suffer, as well as performing a metaanalysis of existing literature. While the attempted statistical analysis was hindered by lack of sufficiently comprehensive free company datasets, I have recorded my efforts in finding suitable databases. I have also performed an exploratory literature review of 15 papers in the field of improving cybersecurity, and identified four blockers to security addressed and three elements of solutions proposed by the papers, as well as derived insights from the distribution of these blockers and elements of solutions in the papers reviewed.
Date Created
2022-05
Agent

A Verifiable Distributed Voting System Without a Trusted Party

161779-Thumbnail Image.png
Description
Cryptographic voting systems such as Helios rely heavily on a trusted party to maintain privacy or verifiability. This tradeoff can be done away with by using distributed substitutes for the components that need a trusted party. By replacing the encryption,

Cryptographic voting systems such as Helios rely heavily on a trusted party to maintain privacy or verifiability. This tradeoff can be done away with by using distributed substitutes for the components that need a trusted party. By replacing the encryption, shuffle, and decryption steps described by Helios with the Pedersen threshold encryption and Neff shuffle, it is possible to obtain a distributed voting system which achieves both privacy and verifiability without trusting any of the contributors. This thesis seeks to examine existing approaches to this problem, and their shortcomings. It provides empirical metrics for comparing different working solutions in detail.
Date Created
2021
Agent

Analyzing, Understanding, and Improving Predicted Variable Names in Decompiled Binary Code

161705-Thumbnail Image.png
Description
Reverse engineers use decompilers to analyze binaries when their source code is unavailable. A binary decompiler attempts to transform binary programs to their corresponding high-level source code by recovering and inferring the information that was lost during the compilation process.

Reverse engineers use decompilers to analyze binaries when their source code is unavailable. A binary decompiler attempts to transform binary programs to their corresponding high-level source code by recovering and inferring the information that was lost during the compilation process. One type of information that is lost during compilation is variable names, which are critical for reverse engineers to analyze and understand programs. Traditional binary decompilers generally use automatically generated, placeholder variable names that are meaningless or have little correlation with their intended semantics. Having correct or meaningful variable names in decompiled code, instead of placeholder variable names, greatly increases the readability of decompiled binary code. Decompiled Identifier Renaming Engine (DIRE) is a state-of-the-art, deep-learning-based solution that automatically predicts variable names in decompiled binary code. However, DIRE's prediction result is far from perfect. The first goal of this research project is to take a close look at the current state-of-the-art solution for automated variable name prediction on decompilation output of binary code, assess the prediction quality, and understand how the prediction result can be improved. Then, as the second goal of this research project, I aim to improve the prediction quality of variable names. With a thorough understanding of DIRE's issues, I focus on improving the quality of training data. This thesis proposes a novel approach to improving the quality of the training data by normalizing variable names and converting their abbreviated forms to their full forms. I implemented and evaluated the proposed approach on a data set of over 10k and 20k binaries and showed improvements over DIRE.
Date Created
2021
Agent

Exploiting and Mitigating Advanced Security Vulnerabilities

161278-Thumbnail Image.png
Description
Cyberspace has become a field where the competitive arms race between defenders and adversaries play out. Adaptive, intelligent adversaries are crafting new responses to the advanced defenses even though the arms race has resulted in a gradual improvement of the

Cyberspace has become a field where the competitive arms race between defenders and adversaries play out. Adaptive, intelligent adversaries are crafting new responses to the advanced defenses even though the arms race has resulted in a gradual improvement of the security posture. This dissertation aims to assess the evolving threat landscape and enhance state-of-the-art defenses by exploiting and mitigating two different types of emerging security vulnerabilities. I first design a new cache attack method named Prime+Count which features low noise and no shared memory needed.I use the method to construct fast data covert channels. Then, I propose a novel software-based approach, SmokeBomb, to prevent cache side-channel attacks for inclusive and non-inclusive caches based on the creation of a private space in the L1 cache. I demonstrate the effectiveness of SmokeBomb by applying it to two different ARM processors with different instruction set versions and cache models and carry out an in-depth evaluation. Next, I introduce an automated approach that exploits a stack-based information leak vulnerability in operating system kernels to obtain sensitive data. Also, I propose a lightweight and widely applicable runtime defense, ViK, for preventing temporal memory safety violations which can lead attackers to have arbitrary code execution or privilege escalation together with information leak vulnerabilities. The security impact of temporal memory safety vulnerabilities is critical, but,they are difficult to identify because of the complexity of real-world software and the spatial separation of allocation and deallocation code. Therefore, I focus on preventing not the vulnerabilities themselves, but their exploitation. ViK can effectively protect operating system kernels and user-space programs from temporal memory safety violations, imposing low runtime and memory overhead.
Date Created
2021
Agent

Cryptojacking Detection: A Classification and Comparison of Malicious Cryptocurrency Mining Detection Systems

147891-Thumbnail Image.png
Description

Cryptojacking is a process in which a program utilizes a user’s CPU to mine cryptocurrencies unknown to the user. Since cryptojacking is a relatively new problem and its impact is still limited, very little has been done to combat it.

Cryptojacking is a process in which a program utilizes a user’s CPU to mine cryptocurrencies unknown to the user. Since cryptojacking is a relatively new problem and its impact is still limited, very little has been done to combat it. Multiple studies have been conducted where a cryptojacking detection system is implemented, but none of these systems have truly solved the problem. This thesis surveys existing studies and provides a classification and evaluation of each detection system with the aim of determining their pros and cons. The result of the evaluation indicates that it might be possible to bypass detection of existing systems by modifying the cryptojacking code. In addition to this classification, I developed an automatic code instrumentation program that replaces specific instructions with functionally similar sequences as a way to show how easy it is to implement simple obfuscation to bypass detection by existing systems.

Date Created
2021-05
Agent

Towards Advanced Malware Classification: A Reused Code Analysis of Mirai Bonnet and Ransomware

158545-Thumbnail Image.png
Description
Due to the increase in computer and database dependency, the damage caused by malicious codes increases. Moreover, gravity and the magnitude of malicious attacks by hackers grow at an unprecedented rate. A key challenge lies on detecting such malicious attacks

Due to the increase in computer and database dependency, the damage caused by malicious codes increases. Moreover, gravity and the magnitude of malicious attacks by hackers grow at an unprecedented rate. A key challenge lies on detecting such malicious attacks and codes in real-time by the use of existing methods, such as a signature-based detection approach. To this end, computer scientists have attempted to classify heterogeneous types of malware on the basis of their observable characteristics. Existing literature focuses on classifying binary codes, due to the greater accessibility of malware binary than source code. Also, for the improved speed and scalability, machine learning-based approaches are widely used. Despite such merits, the machine learning-based approach critically lacks the interpretability of its outcome, thus restricts understandings of why a given code belongs to a particular type of malicious malware and, importantly, why some portions of a code are reused very often by hackers. In this light, this study aims to enhance understanding of malware by directly investigating reused codes and uncovering their characteristics.

To examine reused codes in malware, both malware with source code and malware with binary code are considered in this thesis. For malware with source code, reused code chunks in the Mirai botnet. This study lists frequently reused code chunks and analyzes the characteristics and location of the code. For malware with binary code, this study performs reverse engineering on the binary code for human readers to comprehend, visually inspects reused codes in binary ransomware code, and illustrates the functionality of the reused codes on the basis of similar behaviors and tactics.

This study makes a novel contribution to the literature by directly investigating the characteristics of reused code in malware. The findings of the study can help cybersecurity practitioners and scholars increase the performance of malware classification.
Date Created
2020
Agent

Everything You Ever Wanted to Know About Bitcoin Mixers (But Were Afraid to Ask)

158251-Thumbnail Image.png
Description
The lack of fungibility in Bitcoin has forced its userbase to seek out tools that can heighten their anonymity. Third-party Bitcoin mixers utilize obfuscation techniques to protect participants from blockchain analysis. In recent years, various centralized and decentralized Bitcoin mixing

The lack of fungibility in Bitcoin has forced its userbase to seek out tools that can heighten their anonymity. Third-party Bitcoin mixers utilize obfuscation techniques to protect participants from blockchain analysis. In recent years, various centralized and decentralized Bitcoin mixing implementations have been proposed in academic literature. Although these methods depict a threat-free environment for users to preserve their anonymity, public Bitcoin mixers continue to be associated with theft and poor implementation.

This research explores the public Bitcoin mixer ecosystem to identify if today's mixing services have adopted academically proposed solutions. This is done through real-world interactions with publicly available mixers to analyze both implementation and resistance to common threats in the mixing landscape. First, proposed decentralized and centralized mixing protocols found in literature are outlined. Then, data is presented from 19 publicly announced mixing services available on the deep web and clearnet. The services are categorized based on popularity with the Bitcoin community and experiments are conducted on five public mixing services: ChipMixer, MixTum, Bitcoin Mixer, CryptoMixer, and Sudoku Wallet.

The results of the experiments highlight a clear gap between public and proposed Bitcoin mixers in both implementation and security. Today's mixing services focus on presenting users with a false sense of control to gain their trust rather then employing secure mixing techniques. As a result, the five selected services lack implementation of academically proposed techniques and display poor resistance to common mixer-related threats.
Date Created
2020
Agent

Leveraging Scalable Data Analysis to Proactively Bolster the Anti-Phishing Ecosystem

158081-Thumbnail Image.png
Description
Despite an abundance of defenses that work to protect Internet users from online threats, malicious actors continue deploying relentless large-scale phishing attacks that target these users. Effectively mitigating phishing attacks remains a challenge for the security community due to

Despite an abundance of defenses that work to protect Internet users from online threats, malicious actors continue deploying relentless large-scale phishing attacks that target these users. Effectively mitigating phishing attacks remains a challenge for the security community due to attackers' ability to evolve and adapt to defenses, the cross-organizational nature of the infrastructure abused for phishing, and discrepancies between theoretical and realistic anti-phishing systems. Although technical countermeasures cannot always compensate for the human weakness exploited by social engineers, maintaining a clear and up-to-date understanding of the motivation behind---and execution of---modern phishing attacks is essential to optimizing such countermeasures.

In this dissertation, I analyze the state of the anti-phishing ecosystem and show that phishers use evasion techniques, including cloaking, to bypass anti-phishing mitigations in hopes of maximizing the return-on-investment of their attacks. I develop three novel, scalable data-collection and analysis frameworks to pinpoint the ecosystem vulnerabilities that sophisticated phishing websites exploit. The frameworks, which operate on real-world data and are designed for continuous deployment by anti-phishing organizations, empirically measure the robustness of industry-standard anti-phishing blacklists (PhishFarm and PhishTime) and proactively detect and map phishing attacks prior to launch (Golden Hour). Using these frameworks, I conduct a longitudinal study of blacklist performance and the first large-scale end-to-end analysis of phishing attacks (from spamming through monetization). As a result, I thoroughly characterize modern phishing websites and identify desirable characteristics for enhanced anti-phishing systems, such as more reliable methods for the ecosystem to collectively detect phishing websites and meaningfully share the corresponding intelligence. In addition, findings from these studies led to actionable security recommendations that were implemented by key organizations within the ecosystem to help improve the security of Internet users worldwide.
Date Created
2020
Agent