Full metadata
Title
Interpretable Hate Speech Detection via Large Language Model-extracted Rationales
Description
Social media platforms have become widely used for open communication, yet their lack of moderation has led to the proliferation of harmful content, including hate speech. Manual monitoring of such vast amounts of user-generated data is impractical, thus necessitating automated hate speech detection methods. Pre-trained language models have been proven to possess strong base capabilities, which not only excel at in-distribution language modeling but also show powerful abilities in out-of-distribution language modeling, transfer learning and few-shot learning. However, these models operate as complex function approximators, mapping input text to a hate speech classification, without providing any insights into the reasoning behind their predictions. Hence, existing methods often lack transparency, hindering their effectiveness, particularly in sensitive content moderation contexts. Recent efforts have been made to integrate their capabilities with large language models like ChatGPT and Llama2, which exhibit reasoning capabilities and broad knowledge utilization. This thesis explores leveraging the reasoning abilities of large language models to enhance the interpretability of hate speech detection. A novel framework is proposed that utilizes state-of-the-art Large Language Models (LLMs) to extract interpretable rationales from input text, highlighting key phrases or sentences relevant to hate speech classification. By incorporating these rationale features into a hate speech classifier, the framework inherently provides transparent and interpretable results. This approach combines the language understanding prowess of LLMs with the discriminative power of advanced hate speech classifiers, offering a promising solution to the challenge of interpreting automated hate speech detection models.
Date Created
2024
Contributors
- Nirmal, Ayushi (Author)
- Liu, Huan (Thesis advisor)
- Davulcu, Hasan (Committee member)
- Wei, Hua (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
58 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.2.N.193452
Level of coding
minimal
Cataloging Standards
Note
Partial requirement for: M.S., Arizona State University, 2024
Field of study: Computer Science
System Created
- 2024-05-02 01:38:14
System Modified
- 2024-05-02 01:38:21
- 6 months 1 week ago
Additional Formats