ترغب بنشر مسار تعليمي؟ اضغط هنا

The vanilla GAN (Goodfellow et al. 2014) suffers from mode collapse deeply, which usually manifests as that the images generated by generators tend to have a high similarity amongst them, even though their corresponding latent vectors have been very different. In this paper, we introduce a pluggable diversity penalty module (DPM) to alleviate mode collapse of GANs. It reduces the similarity of image pairs in feature space, i.e., if two latent vectors are different, then we enforce the generator to generate two images with different features. The normalized Gram matrix is used to measure the similarity. We compare the proposed method with Unrolled GAN (Metz et al. 2016), BourGAN (Xiao, Zhong, and Zheng 2018), PacGAN (Lin et al. 2018), VEEGAN (Srivastava et al. 2017) and ALI (Dumoulin et al. 2016) on 2D synthetic dataset, and results show that the diversity penalty module can help GAN capture much more modes of the data distribution. Further, in classification tasks, we apply this method as image data augmentation on MNIST, Fashion- MNIST and CIFAR-10, and the classification testing accuracy is improved by 0.24%, 1.34% and 0.52% compared with WGAN GP (Gulrajani et al. 2017), respectively. In domain translation, diversity penalty module can help StarGAN (Choi et al. 2018) generate more accurate attention masks and accelarate the convergence process. Finally, we quantitatively evaluate the proposed method with IS and FID on CelebA, CIFAR-10, MNIST and Fashion-MNIST, and the results suggest GAN with diversity penalty module gets much higher IS and lower FID compared with some SOTA GAN architectures.
Many scenarios of physics beyond the standard model predict new light, weakly coupled degrees of freedom, populated in the early universe and remaining as cosmic relics today. Due to their high abundances, these relics can significantly affect the ev olution of the universe. For instance, massless relics produce a shift $Delta N_{rm eff}$ to the cosmic expectation of the effective number of active neutrinos. Massive relics, on the other hand, additionally become part of the cosmological dark matter in the later universe, though their light nature allows them to freely stream out of potential wells. This produces novel signatures in the large-scale structure (LSS) of the universe, suppressing matter fluctuations at small scales. We present the first general search for such light (but massive) relics (LiMRs) with cosmic microwave background (CMB) and LSS data, scanning the 2D parameter space of their masses $m_X$ and temperatures $T_X^{(0)}$ today. In the conservative minimum-temperature ($T_X^{(0)}=0.91$ K) scenario, we rule out Weyl (and higher-spin) fermions -- such as the gravitino -- with $m_Xgeq 2.26$ eV at 95% C.L., and set analogous limits of $m_Xleq 11.2, 1.06, 1.56$ eV for scalar, vector, and Dirac-fermion relics. This is the first search for LiMRs with joint CMB, weak-lensing, and full-shape galaxy data; we demonstrate that weak-lensing data is critical for breaking parameter degeneracies, while full-shape information presents a significant boost in constraining power relative to analyses with only baryon acoustic oscillation parameters. Under the combined strength of these datasets, our constraints are the tightest and most comprehensive to date.
49 - Cheng Jie , Da Xu , Zigeng Wang 2021
With the increasing scale of search engine marketing, designing an efficient bidding system is becoming paramount for the success of e-commerce companies. The critical challenges faced by a modern industrial-level bidding system include: 1. the catal og is enormous, and the relevant bidding features are of high sparsity; 2. the large volume of bidding requests induces significant computation burden to both the offline and online serving. Leveraging extraneous user-item information proves essential to mitigate the sparsity issue, for which we exploit the natural language signals from the users query and the contextual knowledge from the products. In particular, we extract the vector representations of ads via the Transformer model and leverage their geometric relation to building collaborative bidding predictions via clustering. The two-step procedure also significantly reduces the computation stress of bid evaluation and optimization. In this paper, we introduce the end-to-end structure of the bidding system for search engine marketing for Walmart e-commerce, which successfully handles tens of millions of bids each day. We analyze the online and offline performances of our approach and discuss how we find it as a production-efficient solution.
An accurate force field is the key to the success of all molecular mechanics simulations on organic polymers and biomolecules. Accuracy beyond density functional theory is often needed to describe the intermolecular interactions, while most correlate d wavefunction (CW) methods are prohibitively expensive for large molecules. Therefore, it posts a great challenge to develop an extendible ab initio force field for large flexible organic molecules at CW level of accuracy. In this work, we face this challenge by combining the physics-driven nonbonding potential with a data-driven subgraph neural network bonding model (named sGNN). Tests on polyethylene glycol polymer chains show that our strategy is highly accurate and robust for molecules of different sizes. Therefore, we can develop the force field from small molecular fragments (with sizes easily accessible to CW methods) and safely transfer it to large polymers, thus opening a new path to the next-generation organic force fields.
Chinese Spell Checking (CSC) aims to detect and correct erroneous characters for user-generated text in the Chinese language. Most of the Chinese spelling errors are misused semantically, phonetically or graphically similar characters. Previous attem pts noticed this phenomenon and try to use the similarity for this task. However, these methods use either heuristics or handcrafted confusion sets to predict the correct character. In this paper, we propose a Chinese spell checker called ReaLiSe, by directly leveraging the multimodal information of the Chinese characters. The ReaLiSe model tackles the CSC task by (1) capturing the semantic, phonetic and graphic information of the input characters, and (2) selectively mixing the information in these modalities to predict the correct output. Experiments on the SIGHAN benchmarks show that the proposed model outperforms strong baselines by a large margin.
Exceptional points (EPs) are exotic degeneracies of non-Hermitian systems, where the eigenvalues and the corresponding eigenvectors simultaneously coalesce in parameter space, and these degeneracies are sensitive to tiny perturbations on the system. Here we report an experimental observation of the EP in a hybrid quantum system consisting of dense nitrogen (P1) centers in diamond coupled to a coplanar-waveguide resonator. These P1 centers can be divided into three subensembles of spins, and cross relaxation occurs among them. As a new method to demonstrate this EP, we pump a given spin subensemble with a drive field to tune the magnon-photon coupling in a wide range. We observe the EP in the middle spin subensemble coupled to the resonator mode, irrespective of which spin subensemble is actually driven. This robustness of the EP against pumping reveals the key role of the cross relaxation in P1 centers. It offers a novel way to convincingly prove the existence of the cross-relaxation effect via the EP.
Sequential deep learning models such as RNN, causal CNN and attention mechanism do not readily consume continuous-time information. Discretizing the temporal data, as we show, causes inconsistency even for simple continuous-time processes. Current ap proaches often handle time in a heuristic manner to be consistent with the existing deep learning architectures and implementations. In this paper, we provide a principled way to characterize continuous-time systems using deep learning tools. Notably, the proposed approach applies to all the major deep learning architectures and requires little modifications to the implementation. The critical insight is to represent the continuous-time system by composing neural networks with a temporal kernel, where we gain our intuition from the recent advancements in understanding deep learning with Gaussian process and neural tangent kernel. To represent the temporal kernel, we introduce the random feature approach and convert the kernel learning problem to spectral density estimation under reparameterization. We further prove the convergence and consistency results even when the temporal kernel is non-stationary, and the spectral density is misspecified. The simulations and real-data experiments demonstrate the empirical effectiveness of our temporal kernel approach in a broad range of settings.
107 - Da Xu , Yuting Ye , Chuanwei Ruan 2021
The recent paper by Byrd & Lipton (2019), based on empirical observations, raises a major concern on the impact of importance weighting for the over-parameterized deep learning models. They observe that as long as the model can separate the training data, the impact of importance weighting diminishes as the training proceeds. Nevertheless, there lacks a rigorous characterization of this phenomenon. In this paper, we provide formal characterizations and theoretical justifications on the role of importance weighting with respect to the implicit bias of gradient descent and margin-based learning theory. We reveal both the optimization dynamics and generalization performance under deep learning models. Our work not only explains the various novel phenomenons observed for importance weighting in deep learning, but also extends to the studies where the weights are being optimized as part of the model, which applies to a number of topics under active research.
Product embeddings have been heavily investigated in the past few years, serving as the cornerstone for a broad range of machine learning applications in e-commerce. Despite the empirical success of product embeddings, little is known on how and why they work from the theoretical standpoint. Analogous results from the natural language processing (NLP) often rely on domain-specific properties that are not transferable to the e-commerce setting, and the downstream tasks often focus on different aspects of the embeddings. We take an e-commerce-oriented view of the product embeddings and reveal a complete theoretical view from both the representation learning and the learning theory perspective. We prove that product embeddings trained by the widely-adopted skip-gram negative sampling algorithm and its variants are sufficient dimension reduction regarding a critical product relatedness measure. The generalization performance in the downstream machine learning task is controlled by the alignment between the embeddings and the product relatedness measure. Following the theoretical discoveries, we conduct exploratory experiments that supports our theoretical insights for the product embeddings.
Most theoretical studies explaining the regularization effect in deep learning have only focused on gradient descent with a sufficient small learning rate or even gradient flow (infinitesimal learning rate). Such researches, however, have neglected a reasonably large learning rate applied in most practical applications. In this work, we characterize the implicit bias effect of deep linear networks for binary classification using the logistic loss in the large learning rate regime, inspired by the seminal work by Lewkowycz et al. [26] in a regression setting with squared loss. They found a learning rate regime with a large stepsize named the catapult phase, where the loss grows at the early stage of training and eventually converges to a minimum that is flatter than those found in the small learning rate regime. We claim that depending on the separation conditions of data, the gradient descent iterates will converge to a flatter minimum in the catapult phase. We rigorously prove this claim under the assumption of degenerate data by overcoming the difficulty of the non-constant Hessian of logistic loss and further characterize the behavior of loss and Hessian for non-separable data. Finally, we demonstrate that flatter minima in the space spanned by non-separable data along with the learning rate in the catapult phase can lead to better generalization empirically.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا