ترغب بنشر مسار تعليمي؟ اضغط هنا

Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ, SPAQ and KonIQ-10k.
In the present paper, we carry out a systematic study of the flavor invariants and their renormalization-group equations (RGEs) in the leptonic sector with three generations of charged leptons and massive Majorana neutrinos. First, following the appr oach of the Hilbert series from the invariant theory, we show that there are 34 basic flavor invariants in the generating set, among which 19 invariants are CP-even and the others are CP-odd. Any flavor invariants can be expressed as the polynomials of those 34 basic invariants in the generating set. Second, we explicitly construct all the basic invariants and derive their RGEs, which form a closed system of differential equations as they should. The numerical solutions to the RGEs of the basic flavor invariants have also been found. Furthermore, we demonstrate how to extract physical observables from the basic invariants. Our study is helpful for understanding the algebraic structure of flavor invariants in the leptonic sector, and also provides a novel way to explore leptonic flavor structures.
In this paper, we propose a generic model transfer scheme to make Convlutional Neural Networks (CNNs) interpretable, while maintaining their high classification accuracy. We achieve this by building a differentiable decision forest on top of CNNs, wh ich enjoys two characteristics: 1) During training, the tree hierarchies of the forest are learned in a top-down manner under the guidance from the category semantics embedded in the pre-trained CNN weights; 2) During inference, a single decision tree is dynamically selected from the forest for each input sample, enabling the transferred model to make sequential decisions corresponding to the attributes shared by semantically-similar categories, rather than directly performing flat classification. We name the transferred model deep Dynamic Sequential Decision Forest (dDSDF). Experimental results show that dDSDF not only achieves higher classification accuracy than its conuterpart, i.e., the original CNN, but has much better interpretability, as qualitatively it has plausible hierarchies and quantitatively it leads to more precise saliency maps.
133 - Yilin Wang , Jiayi Ye 2021
Video classification and analysis is always a popular and challenging field in computer vision. It is more than just simple image classification due to the correlation with respect to the semantic contents of subsequent frames brings difficulties for video analysis. In this literature review, we summarized some state-of-the-art methods for multi-label video classification. Our goal is first to experimentally research the current widely used architectures, and then to develop a method to deal with the sequential data of frames and perform multi-label classification based on automatic content detection of video.
Recently, intensive studies have revealed fascinating physics, such as charge density wave and superconducting states, in the newly synthesized kagome-lattice materials $A$V$_3$Sb$_5$ ($A$=K, Rb, Cs). Despite the rapid progress, fundamental aspects l ike the magnetic properties and electronic correlations in these materials have not been clearly understood yet. Here, based on the density functional theory plus the single-site dynamical mean-field theory calculations, we investigate the correlated electronic structure and the magnetic properties of the KV$_3$Sb$_5$ family materials in the normal state. We show that these materials are good metals with weak local correlations. The obtained Pauli-like paramagnetism and the absence of local moments are consistent with recent experiment. We reveal that the band crossings around the Fermi level form three groups of nodal lines protected by the spacetime inversion symmetry, each carrying a quantized $pi$ Berry phase. Our result suggests that the local correlation strength in these materials appears to be too weak to generate unconventional superconductivity, and non-local electronic correlation might be crucial in this kagome system.
84 - Yilin Wang 2021
These notes survey the first and recent results on large deviations of Schramm-Loewner evolutions (SLE) with the emphasis on interrelations among rate functions and applications to complex analysis. More precisely, we describe the large deviations of SLE$_kappa$ when the $kappa$ parameter goes to zero in the chordal and multichordal case and to infinity in the radial case. The rate functions, namely Loewner and Loewner-Kufarev energies, are closely related to the Weil-Petersson class of quasicircles and real rational functions.
Blind or no-reference video quality assessment of user-generated content (UGC) has become a trending, challenging, unsolved problem. Accurate and efficient video quality predictors suitable for this content are thus in great demand to achieve more in telligent analysis and processing of UGC videos. Previous studies have shown that natural scene statistics and deep learning features are both sufficient to capture spatial distortions, which contribute to a significant aspect of UGC video quality issues. However, these models are either incapable or inefficient for predicting the quality of complex and diverse UGC videos in practical applications. Here we introduce an effective and efficient video quality model for UGC content, which we dub the Rapid and Accurate Video Quality Evaluator (RAPIQUE), which we show performs comparably to state-of-the-art (SOTA) models but with orders-of-magnitude faster runtime. RAPIQUE combines and leverages the advantages of both quality-aware scene statistics features and semantics-aware deep convolutional features, allowing us to design the first general and efficient spatial and temporal (space-time) bandpass statistics model for video quality modeling. Our experimental results on recent large-scale UGC video quality databases show that RAPIQUE delivers top performances on all the datasets at a considerably lower computational expense. We hope this work promotes and inspires further efforts towards practical modeling of video quality problems for potential real-time and low-latency applications. To promote public usage, an implementation of RAPIQUE has been made freely available online: url{https://github.com/vztu/RAPIQUE}.
Compared with common image segmentation tasks targeted at low-resolution images, higher resolution detailed image segmentation receives much less attention. In this paper, we propose and study a task named Meticulous Object Segmentation (MOS), which is focused on segmenting well-defined foreground objects with elaborate shapes in high resolution images (e.g. 2k - 4k). To this end, we propose the MeticulousNet which leverages a dedicated decoder to capture the object boundary details. Specifically, we design a Hierarchical Point-wise Refining (HierPR) block to better delineate object boundaries, and reformulate the decoding process as a recursive coarse to fine refinement of the object mask. To evaluate segmentation quality near object boundaries, we propose the Meticulosity Quality (MQ) score considering both the mask coverage and boundary precision. In addition, we collect a MOS benchmark dataset including 600 high quality images with complex objects. We provide comprehensive empirical evidence showing that MeticulousNet can reveal pixel-accurate segmentation boundaries and is superior to state-of-the-art methods for high resolution object segmentation tasks.
74 - Shikun Liu , Zhe Lin , Yilin Wang 2020
We present a novel resizing module for neural networks: shape adaptor, a drop-in enhancement built on top of traditional resizing layers, such as pooling, bilinear sampling, and strided convolution. Whilst traditional resizing layers have fixed and d eterministic reshaping factors, our module allows for a learnable reshaping factor. Our implementation enables shape adaptors to be trained end-to-end without any additional supervision, through which network architectures can be optimised for each individual task, in a fully automated way. We performed experiments across seven image classification datasets, and results show that by simply using a set of our shape adaptors instead of the original resizing layers, performance increases consistently over human-designed networks, across all datasets. Additionally, we show the effectiveness of shape adaptors on two other applications: network compression and transfer learning. The source code is available at: https://github.com/lorenmt/shape-adaptor.
We derive the large deviation principle for radial Schramm-Loewner evolution ($operatorname{SLE}$) on the unit disk with parameter $kappa rightarrow infty$. Restricting to the time interval $[0,1]$, the good rate function is finite only on a certain family of Loewner chains driven by absolutely continuous probability measures ${phi_t^2 (zeta), dzeta}_{t in [0,1]}$ on the unit circle and equals $int_0^1 int_{S^1} |phi_t|^2/2,dzeta ,dt$. Our proof relies on the large deviation principle for the long-time average of the Brownian occupation measure by Donsker and Varadhan.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا