ترغب بنشر مسار تعليمي؟ اضغط هنا

On the relation between statistical learning and perceptual distances

85   0   0.0 ( 0 )
 نشر من قبل Alexander Hepburn
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

It has been demonstrated many times that the behavior of the human visual system is connected to the statistics of natural images. Since machine learning relies on the statistics of training data as well, the above connection has interesting implications when using perceptual distances (which mimic the behavior of the human visual system) as a loss function. In this paper, we aim to unravel the non-trivial relationship between the probability distribution of the data, perceptual distances, and unsupervised machine learning. To this end, we show that perceptual sensitivity is correlated with the probability of an image in its close neighborhood. We also explore the relation between distances induced by autoencoders and the probability distribution of the data used for training them, as well as how these induced distances are correlated with human perception. Finally, we discuss why perceptual distances might not lead to noticeable gains in performance over standard Euclidean distances in common image processing tasks except when data is scarce and the perceptual distance provides regularization.



قيم البحث

اقرأ أيضاً

We introduce a learning strategy for contrast-invariant image registration without requiring imaging data. While classical registration methods accurately estimate the spatial correspondence between images, they solve a costly optimization problem fo r every image pair. Learning-based techniques are fast at test time, but can only register images with image contrast and geometric content that are similar to those available during training. We focus on removing this image-data dependency of learning methods. Our approach leverages a generative model for diverse label maps and images that exposes networks to a wide range of variability during training, forcing them to learn features invariant to image type (contrast). This strategy results in powerful networks trained to generalize to a broad array of real input images. We present extensive experiments, with a focus on 3D neuroimaging, showing that this strategy enables robust registration of arbitrary image contrasts without the need to retrain for new modalities. We demonstrate registration accuracy that most often surpasses the state of the art both within and across modalities, using a single model. Critically, we show that input labels from which we synthesize images need not be of actual anatomy: training on randomly generated geometric shapes also results in competitive registration performance, albeit slightly less accurate, while alleviating the dependency on real data of any kind. Our code is available at: http://voxelmorph.csail.mit.edu
In this paper, we propose an image quality transformer (IQT) that successfully applies a transformer architecture to a perceptual full-reference image quality assessment (IQA) task. Perceptual representation becomes more important in image quality as sessment. In this context, we extract the perceptual feature representations from each of input images using a convolutional neural network (CNN) backbone. The extracted feature maps are fed into the transformer encoder and decoder in order to compare a reference and distorted images. Following an approach of the transformer-based vision models, we use extra learnable quality embedding and position embedding. The output of the transformer is passed to a prediction head in order to predict a final quality score. The experimental results show that our proposed model has an outstanding performance for the standard IQA datasets. For a large-scale IQA dataset containing output images of generative model, our model also shows the promising results. The proposed IQT was ranked first among 13 participants in the NTIRE 2021 perceptual image quality assessment challenge. Our work will be an opportunity to further expand the approach for the perceptual IQA task.
205 - Romuald A. Janik 2021
We analyze the spaces of images encoded by generative networks of the BigGAN architecture. We find that generic multiplicative perturbations away from the photo-realistic point often lead to images which appear as artistic renditions of the correspon ding objects. This demonstrates an emergence of aesthetic properties directly from the structure of the photo-realistic environment coupled with its neural network parametrization. Moreover, modifying a deep semantic part of the neural network encoding leads to the appearance of symbolic visual representations.
68 - Yang Yue , Liuyuan He , Gan He 2018
Photoreceptors in the retina are coupled by electrical synapses called gap junctions. It has long been established that gap junctions increase the signal-to-noise ratio of photoreceptors. Inspired by electrically coupled photoreceptors, we introduced a simple filter, the PR-filter, with only one variable. On BSD68 dataset, PR-filter showed outstanding performance in SSIM during blind denoising tasks. It also significantly improved the performance of state-of-the-art convolutional neural network blind denosing on non-Gaussian noise. The performance of keeping more details might be attributed to small receptive field of the photoreceptors.
Identification of a person from fingerprints of good quality has been used by commercial applications and law enforcement agencies for many years, however identification of a person from latent fingerprints is very difficult and challenging. A latent fingerprint is a fingerprint left on a surface by deposits of oils and/or perspiration from the finger. It is not usually visible to the naked eye but may be detected with special techniques such as dusting with fine powder and then lifting the pattern of powder with transparent tape. We have evaluated the quality of machine learning techniques that has been implemented in automatic fingerprint identification. In this paper, we use fingerprints of low quality from database DB1 of Fingerprint Verification Competition (FVC 2002) to conduct our experiments. Fingerprints are processed to find its core point using Poincare index and carry out enhancement using Diffusion coherence filter whose performance is known to be good in the high curvature regions of fingerprints. Grey-level Co-Occurrence Matrix (GLCM) based seven statistical descriptors with four different inter pixel distances are then extracted as features and put forward to train and test REPTree, RandomTree, J48, Decision Stump and Random Forest Machine Learning techniques for personal identification. Experiments are conducted on 80 instances and 28 attributes. Our experiments proved that Random Forests and J48 give good results for latent fingerprints as compared to other machine learning techniques and can help improve the identification accuracy.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا