ترغب بنشر مسار تعليمي؟ اضغط هنا

ChaLearn Looking at People: Inpainting and Denoising challenges

443   0   0.0 ( 0 )
 نشر من قبل Meysam Madadi
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Dealing with incomplete information is a well studied problem in the context of machine learning and computational intelligence. However, in the context of computer vision, the problem has only been studied in specific scenarios (e.g., certain types of occlusions in specific types of images), although it is common to have incomplete information in visual data. This chapter describes the design of an academic competition focusing on inpainting of images and video sequences that was part of the competition program of WCCI2018 and had a satellite event collocated with ECCV2018. The ChaLearn Looking at People Inpainting Challenge aimed at advancing the state of the art on visual inpainting by promoting the development of methods for recovering missing and occluded information from images and video. Three tracks were proposed in which visual inpainting might be helpful but still challenging: human body pose estimation, text overlays removal and fingerprint denoising. This chapter describes the design of the challenge, which includes the release of three novel datasets, and the description of evaluation metrics, baselines and evaluation protocol. The results of the challenge are analyzed and discussed in detail and conclusions derived from this event are outlined.



قيم البحث

اقرأ أيضاً

Capturing the `mutual gaze of people is essential for understanding and interpreting the social interactions between them. To this end, this paper addresses the problem of detecting people Looking At Each Other (LAEO) in video sequences. For this pur pose, we propose LAEO-Net, a new deep CNN for determining LAEO in videos. In contrast to previous works, LAEO-Net takes spatio-temporal tracks as input and reasons about the whole track. It consists of three branches, one for each characters tracked head and one for their relative position. Moreover, we introduce two new LAEO datasets: UCO-LAEO and AVA-LAEO. A thorough experimental evaluation demonstrates the ability of LAEONet to successfully determine if two people are LAEO and the temporal window where it happens. Our model achieves state-of-the-art results on the existing TVHID-LAEO video dataset, significantly outperforming previous approaches. Finally, we apply LAEO-Net to social network analysis, where we automatically infer the social relationship between pairs of people based on the frequency and duration that they LAEO.
163 - Weiya Fan 2020
Fingerprint image denoising is a very important step in fingerprint identification. to improve the denoising effect of fingerprint image,we have designs a fingerprint denoising algorithm based on deep encoder-decoder network,which encoder subnet to l earn the fingerprint features of noisy images.the decoder subnet reconstructs the original fingerprint image based on the features to achieve denoising, while using the dilated convolution in the network to increase the receptor field without increasing the complexity and improve the network inference speed. In addition, feature fusion at different levels of the network is achieved through the introduction of residual learning, which further restores the detailed features of the fingerprint and improves the denoising effect. Finally, the experimental results show that the algorithm enables better recovery of edge, line and curve features in fingerprint images, with better visual effects and higher peak signal-to-noise ratio (PSNR) compared to other methods.
Deep learning based image denoising methods have been recently popular due to their improved performance. Traditionally, these methods are trained in a supervised manner, requiring a set of noisy input and clean target image pairs. More recently, sel f-supervised approaches have been proposed to learn denoising from only noisy images. These methods assume that noise across pixels is statistically independent, and the underlying image pixels show spatial correlations across neighborhoods. These methods rely on a masking approach that divides the image pixels into two disjoint sets, where one is used as input to the network while the other is used to define the loss. However, these previous self-supervised approaches rely on a purely data-driven regularization neural network without explicitly taking the masking model into account. In this work, building on these self-supervised approaches, we introduce Noise2Inpaint (N2I), a training approach that recasts the denoising problem into a regularized image inpainting framework. This allows us to use an objective function, which can incorporate different statistical properties of the noise as needed. We use algorithm unrolling to unroll an iterative optimization for solving this objective function and train the unrolled network end-to-end. The training paradigm follows the masking approach from previous works, splitting the pixels into two disjoint sets. Importantly, one of these is now used to impose data fidelity in the unrolled network, while the other still defines the loss. We demonstrate that N2I performs successful denoising on real-world datasets, while better preserving details compared to its purely data-driven counterpart Noise2Self.
Modern methods for counting people in crowded scenes rely on deep networks to estimate people densities in individual images. As such, only very few take advantage of temporal consistency in video sequences, and those that do only impose weak smoothn ess constraints across consecutive frames. In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing them. This enables us to impose much stronger constraints encoding the conservation of the number of people. As a result, it significantly boosts performance without requiring a more complex architecture. Furthermore, it allows us to exploit the correlation between people flow and optical flow to further improve the results. We also show that leveraging people conservation constraints in both a spatial and temporal manner makes it possible to train a deep crowd counting model in an active learning setting with much fewer annotations. This significantly reduces the annotation cost while still leading to similar performance to the full supervision case.
Current vision systems are trained on huge datasets, and these datasets come with costs: curation is expensive, they inherit human biases, and there are concerns over privacy and usage rights. To counter these costs, interest has surged in learning f rom cheaper data sources, such as unlabeled images. In this paper we go a step further and ask if we can do away with real image datasets entirely, instead learning from noise processes. We investigate a suite of image generation models that produce images from simple random processes. These are then used as training data for a visual representation learner with a contrastive loss. We study two types of noise processes, statistical image models and deep generative models under different random initializations. Our findings show that it is important for the noise to capture certain structural properties of real data but that good performance can be achieved even with processes that are far from realistic. We also find that diversity is a key property to learn good representations. Datasets, models, and code are available at https://mbaradad.github.io/learning_with_noise.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا