Full metadata
Title
Referring Expression Comprehension for CLEVR-Ref+ Dataset
Description
Referring Expression Comprehension (REC) is an important area of research in Natural Language Processing (NLP) and vision domain. It involves locating an object in an image described by a natural language referring expression. This task requires information from both Natural Language and Vision aspect. The task is compositional in nature as it requires visual reasoning as underlying process along with relationships among the objects in the image. Recent works based on modular networks have
displayed to be an effective framework for performing visual reasoning task.
Although this approach is effective, it has been established that the current benchmark datasets for referring expression comprehension suffer from bias. Recent work on CLEVR-Ref+ dataset deals with bias issues by constructing a synthetic dataset
and provides an approach for the aforementioned task which performed better than the previous state-of-the-art models as well as showing the reasoning process. This work aims to improve the performance on CLEVR-Ref+ dataset and achieve comparable interpretability. In this work, the neural module network approach with the attention map technique is employed. The neural module network is composed of the primitive operation modules which are specific to their functions and the output is generated using a separate segmentation module. From empirical results, it is clear that this approach is performing significantly better than the current State-of-theart in one aspect (Predicted programs) and achieving comparable results for another aspect (Ground truth programs)
displayed to be an effective framework for performing visual reasoning task.
Although this approach is effective, it has been established that the current benchmark datasets for referring expression comprehension suffer from bias. Recent work on CLEVR-Ref+ dataset deals with bias issues by constructing a synthetic dataset
and provides an approach for the aforementioned task which performed better than the previous state-of-the-art models as well as showing the reasoning process. This work aims to improve the performance on CLEVR-Ref+ dataset and achieve comparable interpretability. In this work, the neural module network approach with the attention map technique is employed. The neural module network is composed of the primitive operation modules which are specific to their functions and the output is generated using a separate segmentation module. From empirical results, it is clear that this approach is performing significantly better than the current State-of-theart in one aspect (Predicted programs) and achieving comparable results for another aspect (Ground truth programs)
Date Created
2020
Contributors
- Rathor, Kuldeep Singh (Author)
- Baral, Chitta (Thesis advisor)
- Yang, Yezhou (Committee member)
- Simeone, Michael (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
104 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.62696
Level of coding
minimal
Note
Masters Thesis Computer Science 2020
System Created
- 2020-12-08 11:57:24
System Modified
- 2021-08-26 09:47:01
- 3 years 2 months ago
Additional Formats