No Arabic abstract
Deep learning (DL) based object detection has achieved great progress. These methods typically assume that large amount of labeled training data is available, and training and test data are drawn from an identical distribution. However, the two assumptions are not always hold in practice. Deep domain adaptive object detection (DDAOD) has emerged as a new learning paradigm to address the above mentioned challenges. This paper aims to review the state-of-the-art progress on deep domain adaptive object detection approaches. Firstly, we introduce briefly the basic concepts of deep domain adaptation. Secondly, the deep domain adaptive detectors are classified into five categories and detailed descriptions of representative methods in each category are provided. Finally, insights for future research trend are presented.
Domain adaptation methods face performance degradation in object detection, as the complexity of tasks require more about the transferability of the model. We propose a new perspective on how CNN models gain the transferability, viewing the weights of a model as a series of motion patterns. The directions of weights, and the gradients, can be divided into domain-specific and domain-invariant parts, and the goal of domain adaptation is to concentrate on the domain-invariant direction while eliminating the disturbance from domain-specific one. Current UDA object detection methods view the two directions as a whole while optimizing, which will cause domain-invariant direction mismatch even if the output features are perfectly aligned. In this paper, we propose the domain-specific suppression, an exemplary and generalizable constraint to the original convolution gradients in backpropagation to detach the two parts of directions and suppress the domain-specific one. We further validate our theoretical analysis and methods on several domain adaptive object detection tasks, including weather, camera configuration, and synthetic to real-world adaptation. Our experiment results show significant advance over the state-of-the-art methods in the UDA object detection field, performing a promotion of $10.2sim12.2%$ mAP on all these domain adaptation scenarios.
In real applications, object detectors based on deep networks still face challenges of the large domain gap between the labeled training data and unlabeled testing data. To reduce the gap, recent techniques are proposed by aligning the image/instance-level features between source and unlabeled target domains. However, these methods suffer from the suboptimal problem mainly because of ignoring the category information of object instances. To tackle this issue, we develop a fine-grained domain alignment approach with a well-designed domain classifier bank that achieves the instance-level alignment respecting to their categories. Specifically, we first employ the mean teacher paradigm to generate pseudo labels for unlabeled samples. Then we implement the class-level domain classifiers and group them together, called domain classifier bank, in which each domain classifier is responsible for aligning features of a specific class. We assemble the bare object detector with the proposed fine-grained domain alignment mechanism as the adaptive detector, and optimize it with a developed crossed adaptive weighting mechanism. Extensive experiments on three popular transferring benchmarks demonstrate the effectiveness of our method and achieve the new remarkable state-of-the-arts.
In this paper, we tackle the domain adaptive object detection problem, where the main challenge lies in significant domain gaps between source and target domains. Previous work seeks to plainly align image-level and instance-level shifts to eventually minimize the domain discrepancy. However, they still overlook to match crucial image regions and important instances across domains, which will strongly affect domain shift mitigation. In this work, we propose a simple but effective categorical regularization framework for alleviating this issue. It can be applied as a plug-and-play component on a series of Domain Adaptive Faster R-CNN methods which are prominent for dealing with domain adaptive detection. Specifically, by integrating an image-level multi-label classifier upon the detection backbone, we can obtain the sparse but crucial image regions corresponding to categorical information, thanks to the weakly localization ability of the classification manner. Meanwhile, at the instance level, we leverage the categorical consistency between image-level predictions (by the classifier) and instance-level predictions (by the detection head) as a regularization factor to automatically hunt for the hard aligned instances of target domains. Extensive experiments of various domain shift scenarios show that our method obtains a significant performance gain over original Domain Adaptive Faster R-CNN detectors. Furthermore, qualitative visualization and analyses can demonstrate the ability of our method for attending on the key regions/instances targeting on domain adaptation. Our code is open-source and available at url{https://github.com/Megvii-Nanjing/CR-DA-DET}.
This survey paper specially analyzed computer vision-based object detection challenges and solutions by different techniques. We mainly highlighted object detection by three different trending strategies, i.e., 1) domain adaptive deep learning-based approaches (discrepancy-based, Adversarial-based, Reconstruction-based, Hybrid). We examined general as well as tiny object detection-related challenges and offered solutions by historical and comparative analysis. In part 2) we mainly focused on tiny object detection techniques (multi-scale feature learning, Data augmentation, Training strategy (TS), Context-based detection, GAN-based detection). In part 3), To obtain knowledge-able findings, we discussed different object detection methods, i.e., convolutions and convolutional neural networks (CNN), pooling operations with trending types. Furthermore, we explained results with the help of some object detection algorithms, i.e., R-CNN, Fast R-CNN, Faster R-CNN, YOLO, and SSD, which are generally considered the base bone of CV, CNN, and OD. We performed comparative analysis on different datasets such as MS-COCO, PASCAL VOC07,12, and ImageNet to analyze results and present findings. At the end, we showed future directions with existing challenges of the field. In the future, OD methods and models can be analyzed for real-time object detection, tracking strategies.
Deploying deep learning models on embedded systems has been challenging due to limited computing resources. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, such as object detection, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and therefore, require specialized convolutions to aggregate spatial information. To address this need, recent work introduces dynamic deformable convolution to augment regular convolutions. However, this will lead to inefficient memory accesses of inputs with existing hardware. In this work, we harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions. We show the speed-accuracy tradeoffs for a set of algorithm modifications including irregular-access versus limited-range and fixed-shape. We then Co-Design a Network CoDeNet with the modified deformable convolution and quantize it to 4-bit weights and 8-bit activations. With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB while achieving 61.7 AP50 on the standard object detection dataset, Pascal VOC. With our higher accuracy implementation, our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters-20.9x smaller but 10% more accurate than Tiny-YOLO.