AvaCAR
Document
Description
For a system of autonomous vehicles functioning together in a traffic scene, 3Dunderstanding of participants in the field of view or surrounding is very essential for
assessing the safety operation of the involved. This problem can be decomposed into online
pose and shape estimation, which has been a core research area of computer vision for over
a decade now. This work is an add-on to support and improve the joint estimate of the pose
and shape of vehicles from monocular cameras. The objective of jointly estimating the
vehicle pose and shape online is enabled by what is called an offline reconstruction
pipeline. In the offline reconstruction step, an approach to obtain the vehicle 3D shape with
keypoints labeled is formulated.
This work proposes a multi-view reconstruction pipeline using images and masks
which can create an approximate shape of vehicles and can be used as a shape prior. Then
a 3D model-fitting optimization approach to refine the shape prior using high quality
computer-aided design (CAD) models of vehicles is developed. A dataset of such 3D
vehicles with 20 keypoints annotated is prepared and call it the AvaCAR dataset. The
AvaCAR dataset can be used to estimate the vehicle shape and pose, without having the
need to collect significant amounts of data needed for adequate training of a neural
network. The online reconstruction can use this synthesis dataset to generate novel
viewpoints and simultaneously train a neural network for pose and shape estimation. Most
methods in the current literature using deep neural networks, that are trained to estimate
pose of the object from a single image, are inherently biased to the viewpoint of the images
used. This approach aims at addressing these existing limitations in the current method by
delivering the online estimation a shape prior which can generate novel views to account
for the bias due to viewpoint. The dataset is provided with ground truth extrinsic parameters
and the compact vector based shape representations which along with the multi-view
dataset can be used to efficiently trained neural networks for vehicle pose and shape
estimation. The vehicles in this library are evaluated with some standard metrics to assure
they are capable of aiding online estimation and model based tracking.