Full metadata
Title
Towards Robust VQA: Evaluations and Methods
Description
Visual Question Answering (VQA) is an increasingly important multi-modal task where models must answer textual questions based on visual image inputs. Numerous VQA datasets have been proposed to train and evaluate models. However, existing benchmarks exhibit a unilateral focus on textual distribution shifts rather than joint shifts across modalities. This is suboptimal for properly assessing model robustness and generalization. To address this gap, a novel multi-modal VQA benchmark dataset is introduced for the first time. This dataset combines both visual and textual distribution shifts across training and test sets. Using this challenging benchmark exposes vulnerabilities in existing models relying on spurious correlations and overfitting to dataset biases. The novel dataset advances the field by enabling more robust model training and rigorous evaluation of multi-modal distribution shift generalization. In addition, a new few-shot multi-modal prompt fusion model is proposed to better adapt models for downstream VQA tasks. The model incorporates a prompt encoder module and dual-path design to align and fuse image and text prompts. This represents a novel prompt learning approach tailored for multi-modal learning across vision and language. Together, the introduced benchmark dataset and prompt fusion model address key limitations around evaluating and improving VQA model robustness. The work expands the methodology for training models resilient to multi-modal distribution shifts.
Date Created
2023
Contributors
- Jyothi Unni, Suraj (Author)
- Liu, Huan (Thesis advisor)
- Davalcu, Hasan (Committee member)
- Bryan, Chris (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
52 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.2.N.190815
Level of coding
minimal
Cataloging Standards
Note
Partial requirement for: M.S., Arizona State University, 2023
Field of study: Computer Science
System Created
- 2023-12-14 01:28:04
System Modified
- 2023-12-14 01:28:09
- 11 months 1 week ago
Additional Formats