Date: 2017
Journal: CVPR
SORT was simple framework that performs Kalman filtering in image space and frame by frame data association using the Hungarian method with an association metric that measures bounding box overlap
But it returns a relatively high # of identity switches as the employed association metric is only accurate when uncertainty is low
To overcome this issue by replacing association metric with a more informed metric that combines motion and appearance information
Deep SORT increase robustness against isses and occlusions while keeping the system easy to implement efficient and applicable to online
The track handling and Kalman filtering framework is mostly identical to the original formulation
State space is defined
aspect ratio
Tracks that exceed a predefined maximum age are considered to have left the scene and are deleted from the track set
To integrate motion and appearance information through combination of two appropriate metrics, Mahalanobis distance is used
Mahalanobis distance favors large uncertainty because it effectively reduces the distance in standard deviations of any detection towards the projected track mean
It is an undesired behaviour as it can lead to increased track fragmentations and unstable tracks
In a final matching stage, intersection is done over union association as proposed in the original SORT algorithm
A wide residual network with two convolutional layers followed by six residual blocks is employed