Data Poisoning Attacks on Linked Data with Graph Regularization

157148-Thumbnail Image.png
Description
Social media has become the norm of everyone for communication. The usage of social media has increased exponentially in the last decade. The myriads of Social media services such as Facebook, Twitter, Snapchat, and Instagram etc allow people to connect

Social media has become the norm of everyone for communication. The usage of social media has increased exponentially in the last decade. The myriads of Social media services such as Facebook, Twitter, Snapchat, and Instagram etc allow people to connect with their friends, and followers freely. The attackers who try to take advantage of this situation has also increased at an exponential rate. Every social media service has its own recommender systems and user profiling algorithms. These algorithms use users current information to make different recommendations. Often the data that is formed from social media services is Linked data as each item/user is usually linked with other users/items. Recommender systems due to their ubiquitous and prominent nature are prone to several forms of attacks. One of the major form of attacks is poisoning the training set data. As recommender systems use current user/item information as the training set to make recommendations, the attacker tries to modify the training set in such a way that the recommender system would benefit the attacker or give incorrect recommendations and hence failing in its basic functionality. Most existing training set attack algorithms work with ``flat" attribute-value data which is typically assumed to be independent and identically distributed (i.i.d.). However, the i.i.d. assumption does not hold for social media data since it is inherently linked as described above. Usage of user-similarity with Graph Regularizer in morphing the training data produces best results to attacker. This thesis proves the same by demonstrating with experiments on Collaborative Filtering with multiple datasets.
Date Created
2019
Agent

Misinformation Detection in Social Media

157057-Thumbnail Image.png
Description
The pervasive use of social media gives it a crucial role in helping the public perceive reliable information. Meanwhile, the openness and timeliness of social networking sites also allow for the rapid creation and dissemination of misinformation. It becomes increasingly

The pervasive use of social media gives it a crucial role in helping the public perceive reliable information. Meanwhile, the openness and timeliness of social networking sites also allow for the rapid creation and dissemination of misinformation. It becomes increasingly difficult for online users to find accurate and trustworthy information. As witnessed in recent incidents of misinformation, it escalates quickly and can impact social media users with undesirable consequences and wreak havoc instantaneously. Different from some existing research in psychology and social sciences about misinformation, social media platforms pose unprecedented challenges for misinformation detection. First, intentional spreaders of misinformation will actively disguise themselves. Second, content of misinformation may be manipulated to avoid being detected, while abundant contextual information may play a vital role in detecting it. Third, not only accuracy, earliness of a detection method is also important in containing misinformation from being viral. Fourth, social media platforms have been used as a fundamental data source for various disciplines, and these research may have been conducted in the presence of misinformation. To tackle the challenges, we focus on developing machine learning algorithms that are robust to adversarial manipulation and data scarcity.

The main objective of this dissertation is to provide a systematic study of misinformation detection in social media. To tackle the challenges of adversarial attacks, I propose adaptive detection algorithms to deal with the active manipulations of misinformation spreaders via content and networks. To facilitate content-based approaches, I analyze the contextual data of misinformation and propose to incorporate the specific contextual patterns of misinformation into a principled detection framework. Considering its rapidly growing nature, I study how misinformation can be detected at an early stage. In particular, I focus on the challenge of data scarcity and propose a novel framework to enable historical data to be utilized for emerging incidents that are seemingly irrelevant. With misinformation being viral, applications that rely on social media data face the challenge of corrupted data. To this end, I present robust statistical relational learning and personalization algorithms to minimize the negative effect of misinformation.
Date Created
2019
Agent

How Fake News Spreads in the U.S: A Geographic Visualization System for Misinformation

133932-Thumbnail Image.png
Description
The spread of fake news (rumors) has been a growing problem on the internet in the past few years due to the increase of social media services. People share fake news articles on social media sometimes without knowing that those

The spread of fake news (rumors) has been a growing problem on the internet in the past few years due to the increase of social media services. People share fake news articles on social media sometimes without knowing that those articles contain false information. Not knowing whether an article is fake or real is a problem because it causes social media news to lose credibility. Prior research on fake news has focused on how to detect fake news, but efforts towards controlling fake news articles on the internet are still facing challenges. Some of these challenges include; it is hard to collect large sets of fake news data, it is hard to collect locations of people who are spreading fake news, and it is difficult to study the geographic distribution of fake news. To address these challenges, I am examining how fake news spreads in the United States (US) by developing a geographic visualization system for misinformation. I am collecting a set of fake news articles from a website called snopes.com. After collecting these articles I am extracting the keywords from each article and storing them in a file. I then use the stored keywords to search on Twitter in order to find out the locations of users who spread the rumors. Finally, I mark those locations on a map in order to show the geographic distribution of fake news. Having access to large sets of fake news data, knowing the locations of people who are spreading fake news, and being able to understand the geographic distribution of fake news will help in the efforts towards addressing the fake news problem on the internet by providing target areas.
Date Created
2018-05
Agent