No Arabic abstract
Some biological mechanisms of early vision are comparatively well understood, but they have yet to be evaluated for their ability to accurately predict and explain human judgments of image similarity. From well-studied simple connectivity patterns in early vision, we derive a novel formalization of the psychophysics of similarity, showing the differential geometry that provides accurate and explanatory accounts of perceptual similarity judgments. These predictions then are further improved via simple regression on human behavioral reports, which in turn are used to construct more elaborate hypothesized neural connectivity patterns. Both approaches outperform standard successful measures of perceived image fidelity from the literature, as well as providing explanatory principles of similarity perception.
The thalamus consists of several histologically and functionally distinct nuclei increasingly implicated in brain pathology and important for treatment, motivating the need for development of fast and accurate thalamic segmentation. The contrast between thalamic nuclei as well as between the thalamus and surrounding tissues is poor in T1 and T2 weighted magnetic resonance imaging (MRI), inhibiting efforts to date to segment the thalamus using standard clinical MRI. Automatic segmentation techniques have been developed to leverage thalamic features better captured by advanced MRI methods, including magnetization prepared rapid acquisition gradient echo (MP-RAGE) , diffusion tensor imaging (DTI), and resting state functional MRI (fMRI). Despite operating on fundamentally different image features, these methods claim a high degree of agreement with the Morel stereotactic atlas of the thalamus. However, no comparison has been undertaken to compare the results of these disparate segmentation methods. We have implemented state-of-the-art structural, diffusion, and functional imaging-based thalamus segmentation techniques and used them on a single set of subjects. We present the first systematic qualitative and quantitative comparison of these methods. We found that functional connectivity-based parcellation exhibited a close correspondence with structural parcellation on the basis of qualitative concordance with the Morel thalamic atlas as well as the quantitative measures of Dice scores and volumetric similarity index.
Motion perception is a critical capability determining a variety of aspects of insects life, including avoiding predators, foraging and so forth. A good number of motion detectors have been identified in the insects visual pathways. Computational modelling of these motion detectors has not only been providing effective solutions to artificial intelligence, but also benefiting the understanding of complicated biological visual systems. These biological mechanisms through millions of years of evolutionary development will have formed solid modules for constructing dynamic vision systems for future intelligent machines. This article reviews the computational motion perception models originating from biological research of insects visual systems in the literature. These motion perception models or neural networks comprise the looming sensitive neuronal models of lobula giant movement detectors (LGMDs) in locusts, the translation sensitive neural systems of direction selective neurons (DSNs) in fruit flies, bees and locusts, as well as the small target motion detectors (STMDs) in dragonflies and hover flies. We also review the applications of these models to robots and vehicles. Through these modelling studies, we summarise the methodologies that generate different direction and size selectivity in motion perception. At last, we discuss about multiple systems integration and hardware realisation of these bio-inspired motion perception models.
Developments in machine learning interpretability techniques over the past decade have provided new tools to observe the image regions that are most informative for classification and localization in artificial neural networks (ANNs). Are the same regions similarly informative to human observers? Using data from 78 new experiments and 6,610 participants, we show that passive attention techniques reveal a significant overlap with human visual selectivity estimates derived from 6 distinct behavioral tasks including visual discrimination, spatial localization, recognizability, free-viewing, cued-object search, and saliency search fixations. We find that input visualizations derived from relatively simple ANN architectures probed using guided backpropagation methods are the best predictors of a shared component in the joint variability of the human measures. We validate these correlational results with causal manipulations using recognition experiments. We show that images masked with ANN attention maps were easier for humans to classify than control masks in a speeded recognition experiment. Similarly, we find that recognition performance in the same ANN models was likewise influenced by masking input images using human visual selectivity maps. This work contributes a new approach to evaluating the biological and psychological validity of leading ANNs as models of human vision: by examining their similarities and differences in terms of their visual selectivity to the information contained in images.
Transformers with remarkable global representation capacities achieve competitive results for visual tasks, but fail to consider high-level local pattern information in input images. In this paper, we present a generic Dual-stream Network (DS-Net) to fully explore the representation capacity of local and global pattern features for image classification. Our DS-Net can simultaneously calculate fine-grained and integrated features and efficiently fuse them. Specifically, we propose an Intra-scale Propagation module to process two different resolutions in each block and an Inter-Scale Alignment module to perform information interaction across features at dual scales. Besides, we also design a Dual-stream FPN (DS-FPN) to further enhance contextual information for downstream dense predictions. Without bells and whistles, the propsed DS-Net outperforms Deit-Small by 2.4% in terms of top-1 accuracy on ImageNet-1k and achieves state-of-the-art performance over other Vision Transformers and ResNets. For object detection and instance segmentation, DS-Net-Small respectively outperforms ResNet-50 by 6.4% and 5.5 % in terms of mAP on MSCOCO 2017, and surpasses the previous state-of-the-art scheme, which significantly demonstrates its potential to be a general backbone in vision tasks. The code will be released soon.
Image quality assessment (IQA) models aim to establish a quantitative relationship between visual images and their perceptual quality by human observers. IQA modeling plays a special bridging role between vision science and engineering practice, both as a test-bed for vision theories and computational biovision models, and as a powerful tool that could potentially make profound impact on a broad range of image processing, computer vision, and computer graphics applications, for design, optimization, and evaluation purposes. IQA research has enjoyed an accelerated growth in the past two decades. Here we present an overview of IQA methods from a Bayesian perspective, with the goals of unifying a wide spectrum of IQA approaches under a common framework and providing useful references to fundamental concepts accessible to vision scientists and image processing practitioners. We discuss the implications of the successes and limitations of modern IQA methods for biological vision and the prospect for vision science to inform the design of future artificial vision systems.