Algorithmic Solutions for Socially Responsible AI

168720-Thumbnail Image.png
Description
Artificial intelligence (AI) has the potential to drive us towards a future in which all of humanity flourishes. It also comes with substantial risks of oppression and calamity. For example, social media platforms have knowingly and surreptitiously promoted harmful content,

Artificial intelligence (AI) has the potential to drive us towards a future in which all of humanity flourishes. It also comes with substantial risks of oppression and calamity. For example, social media platforms have knowingly and surreptitiously promoted harmful content, e.g., the rampant instances of disinformation and hate speech. Machine learning algorithms designed for combating hate speech were also found biased against underrepresented and disadvantaged groups. In response, researchers and organizations have been working to publish principles and regulations for the responsible use of AI. However, these conceptual principles also need to be turned into actionable algorithms to materialize AI for good. The broad aim of my research is to design AI systems that responsibly serve users and develop applications with social impact. This dissertation seeks to develop the algorithmic solutions for Socially Responsible AI (SRAI), a systematic framework encompassing the responsible AI principles and algorithms, and the responsible use of AI. In particular, it first introduces an interdisciplinary definition of SRAI and the AI responsibility pyramid, in which four types of AI responsibilities are described. It then elucidates the purpose of SRAI: how to bridge from the conceptual definitions to responsible AI practice through the three human-centered operations -- to Protect and Inform users, and Prevent negative consequences. They are illustrated in the social media domain given that social media has revolutionized how people live but has also contributed to the rise of many societal issues. The three representative tasks for each dimension are cyberbullying detection, disinformation detection and dissemination, and unintended bias mitigation. The means of SRAI is to develop responsible AI algorithms. Many issues (e.g., discrimination and generalization) can arise when AI systems are trained to improve accuracy without knowing the underlying causal mechanism. Causal inference, therefore, is intrinsically related to understanding and resolving these challenging issues in AI. As a result, this dissertation also seeks to gain an in-depth understanding of AI by looking into the precise relationships between causes and effects. For illustration, it introduces a recent work that applies deep learning to estimating causal effects and shows that causal learning algorithms can outperform traditional methods.
Date Created
2022
Agent

External Validity of Estimates of Social Distance

156479-Thumbnail Image.png
Description
Social discounting underlies individual altruistic decision-making, and it is frequently measured as the amount of hypothetical money one is willing to forgo for another person as a function of social distance. In the classic social discounting task, individual participants are

Social discounting underlies individual altruistic decision-making, and it is frequently measured as the amount of hypothetical money one is willing to forgo for another person as a function of social distance. In the classic social discounting task, individual participants are asked to imagine their friends along a continuum of social distance, that is then used to estimate participant’s social discounting rate. While an ever-growing proportion of social interactions takes place over social media, no research has yet characterized social discounting in that context. Moreover, no research has estimated social discounting rate using real persons’ social distance, instead of the hypothetical continuum described above. Using existing social media indicators of social distance, it is now possible to estimate social discounting rate based on real people, which may lead to more accurate social discounting measurements and may expand the discounting model to real-life situations. Specifically, using computer algorithms to estimate the social distance from social media data makes it possible to assess the utility of numeric social distance indicators and the most appropriate ways to represent them. The proposed study examined the extent to which a hyperbolic model for social discounting fits social distance information retrieved from Facebook pages; and assessed whether there were differences in discounting rate when real or hypothetical social distance is used; also to further investigate whether discounting rates based on real persons are in fact based on perceived social distance by the participant, or on the imaginary social distance scale (i.e., an experimental artifact.)

It was found that the social discounting model can be applied in the social media context, even when real Facebook friends’ profiles were used as substitutes of numeric social distance indicators. Additionally, people showed similar altruistic tendencies in both the numeric and profile social discounting tests on the Facebook environment. These findings were qualified, however, by a high rate of nonsystematic data for the profile group; a rate much higher than traditional numeric paradigm. This discrepancy suggested that the allocation rates between numeric and profile approaches need further investigation to determine the factors affecting individuals’ generosity as a function of social distance indicators.
Date Created
2018
Agent

Efficient processing of skyline queries on static data sources, data streams and incomplete datasets

153229-Thumbnail Image.png
Description
Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.

An assumption commonly made by

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.

An assumption commonly made by many skyline algorithms is that a skyline query is applied to a single static data source or data stream. Unfortunately, this assumption does not hold in many applications in which a skyline query may involve attributes belonging to multiple data sources and requires a join operation to be performed before the skyline can be produced. Recently, various skyline-join algorithms have been proposed to address this problem in the context of static data sources. However, these algorithms suffer from several drawbacks: they often need to scan the data sources exhaustively to obtain the skyline-join results; moreover, the pruning techniques employed to eliminate tuples are largely based on expensive tuple-to-tuple comparisons. On the other hand, most data stream techniques focus on single stream skyline queries, thus rendering them unsuitable for skyline-join queries.

Another assumption typically made by most of the earlier skyline algorithms is that the data is complete and all skyline attribute values are available. Due to this constraint, these algorithms cannot be applied to incomplete data sources in which some of the attribute values are missing and are represented by NULL values. There exists a definition of dominance for incomplete data, but this leads to undesirable consequences such as non-transitive and cyclic dominance relations both of which are detrimental to skyline processing.

Based on the aforementioned observations, the main goal of the research described in this dissertation is the design and development of a framework of skyline operators that effectively handles three distinct types of skyline queries: 1) skyline-join queries on static data sources, 2) skyline-window-join queries over data streams, and 3) strata-skyline queries on incomplete datasets. This dissertation presents the unique challenges posed by these skyline queries and addresses the shortcomings of current skyline techniques by proposing efficient methods to tackle the added overhead in processing skyline queries on static data sources, data streams, and incomplete datasets.
Date Created
2014
Agent