Full metadata
Title
A semantic triplet based story classifier
Description
Text classification, in the artificial intelligence domain, is an activity in which text documents are automatically classified into predefined categories using machine learning techniques. An example of this is classifying uncategorized news articles into different predefined categories such as "Business", "Politics", "Education", "Technology" , etc. In this thesis, supervised machine learning approach is followed, in which a module is first trained with pre-classified training data and then class of test data is predicted. Good feature extraction is an important step in the machine learning approach and hence the main component of this text classifier is semantic triplet based features in addition to traditional features like standard keyword based features and statistical features based on shallow-parsing (such as density of POS tags and named entities). Triplet {Subject, Verb, Object} in a sentence is defined as a relation between subject and object, the relation being the predicate (verb). Triplet extraction process, is a 5 step process which takes input corpus as a web text document(s), each consisting of one or many paragraphs, from RSS feeds to lists of extremist website. Input corpus feeds into the "Pronoun Resolution" step, which uses an heuristic approach to identify the noun phrases referenced by the pronouns. The next step "SRL Parser" is a shallow semantic parser and converts the incoming pronoun resolved paragraphs into annotated predicate argument format. The output of SRL parser is processed by "Triplet Extractor" algorithm which forms the triplet in the form {Subject, Verb, Object}. Generalization and reduction of triplet features is the next step. Reduced feature representation reduces computing time, yields better discriminatory behavior and handles curse of dimensionality phenomena. For training and testing, a ten- fold cross validation approach is followed. In each round SVM classifier is trained with 90% of labeled (training) data and in the testing phase, classes of remaining 10% unlabeled (testing) data are predicted. Concluding, this paper proposes a model with semantic triplet based features for story classification. The effectiveness of the model is demonstrated against other traditional features used in the literature for text classification tasks.
Date Created
2013
Contributors
- Karad, Ravi Chandravadan (Author)
- Davulcu, Hasan (Thesis advisor)
- Corman, Steven (Committee member)
- Sen, Arunabha (Committee member)
- Arizona State University (Publisher)
Topical Subject
- Computer Science
- A Semantic Triplet Based Story Classifier
- Machine Learning
- Natural Language Processing
- Ravi Karad
- SVM (Support Vector Machine) classifier
- Text classification
- Supervised learning (Machine learning)
- Natural language processing (Computer science)
- Support Vector Machines
- Semantic computing
Resource Type
Extent
viii, 56 p. : ill. (some col.)
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.17802
Statement of Responsibility
by Ravi Chandravadan Karad
Description Source
Viewed on Nov. 13, 2013
Level of coding
full
Note
thesis
Partial requirement for: M.S., Arizona State University, 2013
bibliography
Includes bibliographical references (p. 54-56)
Field of study: Computer science
System Created
- 2013-07-12 06:17:58
System Modified
- 2021-08-30 01:42:27
- 3 years 2 months ago
Additional Formats