Abstract: Transformer-based object detection models usually adopt an encoding-decoding architecture that mainly combines self-attention (SA) and multilayer perceptron (MLP). Although this architecture ...
Abstract: Multi-object tracking (MOT) aims to estimate the bounding boxes and ID labels of objects in videos. The challenging issue in this task is to alleviate competitive learning between the ...