Full metadata
Title
Semantic feature extraction for narrative analysis
Description
A story is defined as "an actor(s) taking action(s) that culminates in a resolution(s)''. I present novel sets of features to facilitate story detection among text via supervised classification and further reveal different forms within stories via unsupervised clustering. First, I investigate the utility of a new set of semantic features compared to standard keyword features combined with statistical features, such as density of part-of-speech (POS) tags and named entities, to develop a story classifier. The proposed semantic features are based on triplets that can be extracted using a shallow parser. Experimental results show that a model of memory-based semantic linguistic features alongside statistical features achieves better accuracy. Next, I further improve the performance of story detection with a novel algorithm which aggregates the triplets producing generalized concepts and relations. A major challenge in automated text analysis is that different words are used for related concepts. Analyzing text at the surface level would treat related concepts (i.e. actors, actions, targets, and victims) as different objects, potentially missing common narrative patterns. The algorithm clusters triplets into generalized concepts by utilizing syntactic criteria based on common contexts and semantic corpus-based statistical criteria based on "contextual synonyms''. Generalized concepts representation of text (1) overcomes surface level differences (which arise when different keywords are used for related concepts) without drift, (2) leads to a higher-level semantic network representation of related stories, and (3) when used as features, they yield a significant (36%) boost in performance for the story detection task. Finally, I implement co-clustering based on generalized concepts/relations to automatically detect story forms. Overlapping generalized concepts and relationships correspond to archetypes/targets and actions that characterize story forms. I perform co-clustering of stories using standard unigrams/bigrams and generalized concepts. I show that the residual error of factorization with concept-based features is significantly lower than the error with standard keyword-based features. I also present qualitative evaluations by a subject matter expert, which suggest that concept-based features yield more coherent, distinctive and interesting story forms compared to those produced by using standard keyword-based features.
Date Created
2016
Contributors
- Ceran, Saadet Betul (Author)
- Davulcu, Hasan (Thesis advisor)
- Corman, Steven R. (Committee member)
- Shakarian, Paulo (Committee member)
- Ye, Jieping (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
vi, 66 pages : illustrations (some color)
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.40243
Statement of Responsibility
by Saadet Betul Ceran
Description Source
Viewed on November 8, 2016
Level of coding
full
Note
thesis
Partial requirement for: Ph.D., Arizona State University, 2016
bibliography
Includes bibliographical references (pages 62-66)
Field of study: Computer science
System Created
- 2016-10-12 02:17:32
System Modified
- 2021-08-30 01:21:36
- 3 years 3 months ago
Additional Formats