ترغب بنشر مسار تعليمي؟ اضغط هنا

81 - Yujin Chen , Zhigang Tu , Di Kang 2021
Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity. To reliably reconstruct a 3D hand from a monocular image, most state-of-the-art methods heavily rely on 3D annotations at the training stage, but obtaining 3D annotations is expensive. To alleviate reliance on labeled training data, we propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint. Specifically, we obtain geometric cues from the input image through easily accessible 2D detected keypoints. To learn an accurate hand reconstruction model from these noisy geometric cues, we utilize the consistency between 2D and 3D representations and propose a set of novel losses to rationalize outputs of the neural network. For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations. Our experiments show that the proposed method achieves comparable performance with recent fully-supervised methods while using fewer supervision data.
Person Search is designed to jointly solve the problems of Person Detection and Person Re-identification (Re-ID), in which the target person will be located in a large number of uncut images. Over the past few years, Person Search based on deep learn ing has made great progress. Visual character attributes play a key role in retrieving the query person, which has been explored in Re-ID but has been ignored in Person Search. So, we introduce attribute learning into the model, allowing the use of attribute features for retrieval task. Specifically, we propose a simple and effective model called Multi-Attribute Enhancement (MAE) which introduces attribute tags to learn local features. In addition to learning the global representation of pedestrians, it also learns the local representation, and combines the two aspects to learn robust features to promote the search performance. Additionally, we verify the effectiveness of our module on the existing benchmark dataset, CUHK-SYSU and PRW. Ultimately, our model achieves state-of-the-art among end-to-end methods, especially reaching 91.8% of mAP and 93.0% of rank-1 on CUHK-SYSU.Codes and models are available at https://github.com/chenlq123/MAE.
222 - Yujin Chen , Zhigang Tu , Di Kang 2020
Accurate 3D reconstruction of the hand and object shape from a hand-object image is important for understanding human-object interaction as well as human daily activities. Different from bare hand pose estimation, hand-object interaction poses a stro ng constraint on both the hand and its manipulated object, which suggests that hand configuration may be crucial contextual information for the object, and vice versa. However, current approaches address this task by training a two-branch network to reconstruct the hand and object separately with little communication between the two branches. In this work, we propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches. We extensively investigate cross-branch feature fusion architectures with MLP or LSTM units. Among the investigated architectures, a variant with LSTM units that enhances object feature with hand feature shows the best performance gain. Moreover, we employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map, which further improves the reconstruction accuracy. Experiments conducted on public datasets demonstrate that our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا