Visual Perception, Prediction and Understanding with Relations

Document
Description
Rapid development of computer vision applications such as image recognition and object detection has been enabled by the emerging deep learning technologies. To improve the accuracy further, deeper and wider neural networks with diverse architecture are proposed for better feature

Rapid development of computer vision applications such as image recognition and object detection has been enabled by the emerging deep learning technologies. To improve the accuracy further, deeper and wider neural networks with diverse architecture are proposed for better feature extraction. Though the performance boost is impressive, only marginal improvement can be achieved with significantly increased computational overhead. One solution is to compress the exploding-sized model by dropping less important weights or channels. This is an effective solution that has been well explored. However, by utilizing the rich relation information of the data, one can also improve the accuracy with reasonable overhead. This work makes progress toward efficient and accurate visual tasks including detection, prediction and understanding by using relations.
For object detection, a novel approach, Graph Assisted Reasoning (GAR), is proposed to utilize a heterogeneous graph to model object-object relations and object-scene relations. GAR fuses the features from neighboring object nodes as well as scene nodes. In this way, GAR produces better recognition than that produced from individual object nodes. Moreover, compared to previous approaches using Recurrent Neural Network (RNN), GAR's light-weight and low-coupling architecture further facilitate its integration into the object detection module.

For trajectories prediction, a novel approach, namely Diverse Attention RNN (DAT-RNN), is proposed to handle the diversity of trajectories and modeling of neighboring relations. DAT-RNN integrates both temporal and spatial relations to improve the prediction under various circumstances.

Last but not least, this work presents a novel relation implication-enhanced (RIE) approach that improves relation detection through relation direction and implication. With the relation implication, the SGG model is exposed to more ground truth information and thus mitigates the overfitting problem of the biased datasets. Moreover, the enhancement with relation implication is compatible with various context encoding schemes.

Comprehensive experiments on benchmarking datasets demonstrate the efficacy of the proposed approaches.