Computer Vision: Improving Detection and Tracking for Occluded and Blurry Settings
Document
Description
Computer vision and tracking has become an area of great interest for many reasons, including self-driving cars, identification of vehicles and drivers on roads, and security camera monitoring, all of which are expanding in the modern digital era. When working with practical systems that are constrained in multiple ways, such as video quality or viewing angle, algorithms that work well theoretically can have a high error rate in practice. This thesis studies several ways in which that error can be minimized.This thesis describes an application in a practical system. This project is to detect, track and count people entering different lanes at an airport security checkpoint, using CCTV videos as a primary source. This thesis improves an existing algorithm that is not optimized for this particular problem and has a high error rate when comparing the algorithm counts with the true volume of users.
The high error rate is caused by many people crowding into security lanes at the same time. The camera from which footage was captured is located at a poor angle, and thus many of the people occlude each other and cause the existing algorithm to miss people. One solution is to count only heads; since heads are smaller than a full body, they will occlude less, and in addition, since the camera is angled from above, the heads in back will appear higher and will not be occluded by people in front. One of the primary improvements to the algorithm is to combine both person detections and head detections to improve the accuracy.
The proposed algorithm also improves the accuracy of detections. The existing algorithm used the COCO training dataset, which works well in scenarios where people are visible and not occluded. However, the available video quality in this project was not very good, with people often blocking each other from the camera’s view. Thus, a different training set was needed that could detect people even in poor-quality frames and with occlusion. The new training set is the first algorithmic improvement, and although occasionally performing worse, corrected the error by 7.25% on average.