ترغب بنشر مسار تعليمي؟ اضغط هنا

151 - Defang Chen , Can Wang , Yan Feng 2021
Knowledge distillation is a generalized logits matching technique for model compression. Their equivalence is previously established on the condition of $textit{infinity temperature}$ and $textit{zero-mean normalization}$. In this paper, we prove tha t with only $textit{infinity temperature}$, the effect of knowledge distillation equals to logits matching with an extra regularization. Furthermore, we reveal that an additional weaker condition -- $textit{equal-mean initialization}$ rather than the original $textit{zero-mean normalization}$ already suffices to set up the equivalence. The key to our proof is we realize that in modern neural networks with the cross-entropy loss and softmax activation, the mean of back-propagated gradient on logits always keeps zero.
137 - Dong Ding , Can Wang , Ying-Qiu He 2021
Wigners friend thought experiment is intended to reveal the inherent tension between unitary evolution and measurement collapse. On the basis of Wigners friend experiment, Brukner derives a no-go theorem for observer-independent facts. We construct a n extended Wigners friend scenario including three laboratories, namely, Alices laboratory, Bobs laboratory and Charlies laboratory, where Alice, Bob and Charlie are standing outside the laboratories while their friends are placed inside their own laboratories. We consider quantum simulation via Q# quantum programming and also realize the primary quantum circuits using IBM quantum computers. Then, we calculate the probabilities and corresponding statistical uncertainties. It has been shown that the results of quantum simulation are clearly consistent with theoretical values, while it has a slightly higher error rates for the experimental results of quantum computers mainly because of a series of quantum gates, especially CNOT gates.
Simulations of high-complexity quantum systems, which are intractable for classical computers, can be efficiently done with quantum computers. Similarly, the increasingly complex quantum electronic circuits themselves will also need efficient simulat ions on quantum computers, which in turn will be important in quantum-aided design for next-generation quantum processors. Here, we implement variational quantum eigensolvers to simulate a Josephson-junction-array quantum circuit, which leads to the discovery of a new type of high-performance qubit, plasonium. We fabricate this new qubit and demonstrate that it exhibits not only long coherence time and high gate fidelity, but also a shrinking physical size and larger anharmonicity than the transmon, which can offer a number of advantages for scaling up multi-qubit devices. Our work opens the way to designing advanced quantum processors using existing quantum computing resources.
85 - Fangzhou Han , Can Wang , Hao Du 2021
Despite recent breakthroughs in deep learning methods for image lighting enhancement, they are inferior when applied to portraits because 3D facial information is ignored in their models. To address this, we present a novel deep learning framework fo r portrait lighting enhancement based on 3D facial guidance. Our framework consists of two stages. In the first stage, corrected lighting parameters are predicted by a network from the input bad lighting image, with the assistance of a 3D morphable model and a differentiable renderer. Given the predicted lighting parameter, the differentiable renderer renders a face image with corrected shading and texture, which serves as the 3D guidance for learning image lighting enhancement in the second stage. To better exploit the long-range correlations between the input and the guidance, in the second stage, we design an image-to-image translation network with a novel transformer architecture, which automatically produces a lighting-enhanced result. Experimental results on the FFHQ dataset and in-the-wild images show that the proposed method outperforms state-of-the-art methods in terms of both quantitative metrics and visual quality. We will publish our dataset along with more results on https://cassiepython.github.io/egsr/index.html.
Graph Neural Network (GNN) has been demonstrated its effectiveness in dealing with non-Euclidean structural data. Both spatial-based and spectral-based GNNs are relying on adjacency matrix to guide message passing among neighbors during feature aggre gation. Recent works have mainly focused on powerful message passing modules, however, in this paper, we show that none of the message passing modules is necessary. Instead, we propose a pure multilayer-perceptron-based framework, Graph-MLP with the supervision signal leveraging graph structure, which is sufficient for learning discriminative node representation. In model-level, Graph-MLP only includes multi-layer perceptrons, activation function, and layer normalization. In the loss level, we design a neighboring contrastive (NContrast) loss to bridge the gap between GNNs and MLPs by utilizing the adjacency information implicitly. This design allows our model to be lighter and more robust when facing large-scale graph data and corrupted adjacency information. Extensive experiments prove that even without adjacency information in testing phase, our framework can still reach comparable and even superior performance against the state-of-the-art models in the graph node classification task.
Face image manipulation via three-dimensional guidance has been widely applied in various interactive scenarios due to its semantically-meaningful understanding and user-friendly controllability. However, existing 3D-morphable-model-based manipulatio n methods are not directly applicable to out-of-domain faces, such as non-photorealistic paintings, cartoon portraits, or even animals, mainly due to the formidable difficulties in building the model for each specific face domain. To overcome this challenge, we propose, as far as we know, the first method to manipulate faces in arbitrary domains using human 3DMM. This is achieved through two major steps: 1) disentangled mapping from 3DMM parameters to the latent space embedding of a pre-trained StyleGAN2 that guarantees disentangled and precise controls for each semantic attribute; and 2) cross-domain adaptation that bridges domain discrepancies and makes human 3DMM applicable to out-of-domain faces by enforcing a consistent latent space embedding. Experiments and comparisons demonstrate the superiority of our high-quality semantic manipulation method on a variety of face domains with all major 3D facial attributes controllable: pose, expression, shape, albedo, and illumination. Moreover, we develop an intuitive editing interface to support user-friendly control and instant feedback. Our project page is https://cassiepython.github.io/sigasia/cddfm3d.html.
Standard quantum mechanics has been formulated with complex-valued Schrodinger equations, wave functions, operators, and Hilbert spaces. However, previous work has shown possible to simulate quantum systems using only real numbers by adding extra qub its and exploiting an enlarged Hilbert space. A fundamental question arises: are the complex numbers really necessary for the quantum mechanical description of nature? To answer this question, a non-local game has been developed to reveal a contradiction between a multiqubit quantum experiment and a player using only real numbers. Here, based on deterministic and high-fidelity entanglement swapping with superconducting qubits, we experimentally implement the Bell-like game and observe a quantum score of 8.09(1), which beats the real number bound of 7.66 by 43 standard deviations. Our results disprove the real-number description of nature and establish the indispensable role of complex numbers in quantum mechanics.
Pre-trained contextual vision-and-language (V&L) models have achieved impressive performance on various benchmarks. However, existing models require a large amount of parallel image-caption data for pre-training. Such data are costly to collect and r equire cumbersome curation. Inspired by unsupervised machine translation, we investigate if a strong V&L representation model can be learned through unsupervised pre-training without image-caption corpora. In particular, we propose to conduct ``mask-and-predict pre-training on text-only and image-only corpora and introduce the object tags detected by an object recognition model as anchor points to bridge two modalities. We find that such a simple approach achieves performance close to a model pre-trained with aligned data, on four English V&L benchmarks. Our work challenges the widely held notion that aligned data is necessary for V&L pre-training, while significantly reducing the amount of supervision needed for V&L models.
Scene graph generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild. Perception errors often lead to nonsensical compositions in the output scene graph , which do not follow real-world rules and patterns, and can be corrected using commonsense knowledge. We propose the first method to acquire visual commonsense such as affordance and intuitive physics automatically from data, and use that to improve the robustness of scene understanding. To this end, we extend Transformer models to incorporate the structure of scene graphs, and train our Global-Local Attention Transformer on a scene graph corpus. Once trained, our model can be applied on any scene graph generation model and correct its obvious mistakes, resulting in more semantically plausible scene graphs. Through extensive experiments, we show our model learns commonsense better than any alternative, and improves the accuracy of state-of-the-art scene graph generation methods.
Unconstrained remote gaze estimation remains challenging mostly due to its vulnerability to the large variability in head-pose. Prior solutions struggle to maintain reliable accuracy in unconstrained remote gaze tracking. Among them, appearance-based solutions demonstrate tremendous potential in improving gaze accuracy. However, existing works still suffer from head movement and are not robust enough to handle real-world scenarios. Especially most of them study gaze estimation under controlled scenarios where the collected datasets often cover limited ranges of both head-pose and gaze which introduces further bias. In this paper, we propose novel end-to-end appearance-based gaze estimation methods that could more robustly incorporate different levels of head-pose representations into gaze estimation. Our method could generalize to real-world scenarios with low image quality, different lightings and scenarios where direct head-pose information is not available. To better demonstrate the advantage of our methods, we further propose a new benchmark dataset with the most rich distribution of head-gaze combination reflecting real-world scenarios. Extensive evaluations on several public datasets and our own dataset demonstrate that our method consistently outperforms the state-of-the-art by a significant margin.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا