ترغب بنشر مسار تعليمي؟ اضغط هنا

Cycle-consistent generative adversarial networks (CycleGAN) have shown their promising performance for speech enhancement (SE), while one intractable shortcoming of these CycleGAN-based SE systems is that the noise components propagate throughout the cycle and cannot be completely eliminated. Additionally, conventional CycleGAN-based SE systems only estimate the spectral magnitude, while the phase is unaltered. Motivated by the multi-stage learning concept, we propose a novel two-stage denoising system that combines a CycleGAN-based magnitude enhancing network and a subsequent complex spectral refining network in this paper. Specifically, in the first stage, a CycleGAN-based model is responsible for only estimating magnitude, which is subsequently coupled with the original noisy phase to obtain a coarsely enhanced complex spectrum. After that, the second stage is applied to further suppress the residual noise components and estimate the clean phase by a complex spectral mapping network, which is a pure complex-valued network composed of complex 2D convolution/deconvolution and complex temporal-frequency attention blocks. Experimental results on two public datasets demonstrate that the proposed approach consistently surpasses previous one-stage CycleGANs and other state-of-the-art SE systems in terms of various evaluation metrics, especially in background noise suppression.
Adopting a binned method, we model-independently reconstruct the mass function of primordial black holes (PBHs) from GWTC-2 and find that such a PBH mass function can be explained by a broad red-tilted power spectrum of curvature perturbations. Even though GW190521 with component masses in upper mass gap $(m>65M_odot)$ can be naturally interpreted in the PBH scenario, the events (including GW190814, GW190425, GW200105, and GW200115) with component masses in the light mass range $(m<3M_odot)$ are quite unlikely to be explained by binary PBHs although there are no electromagnetic counterparts because the corresponding PBH merger rates are much smaller than those given by LIGO-Virgo. Furthermore, we predict that both the gravitational-wave (GW) background generated by the binary PBHs and the scalar-induced GWs accompanying the formation of PBHs should be detected by the ground-based and space-borne GW detectors and pulsar timing arrays in the future.
266 - Yulin Li , Yuxi Qian , Yuchen Yu 2021
Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence. Due to the complexity of content and layout in VRDs, structured text understanding has been a challenging task. Most existing studies decouple d this problem into two sub-tasks: entity labeling and entity linking, which require an entire understanding of the context of documents at both token and segment levels. However, little work has been concerned with the solutions that efficiently extract the structured data from different levels. This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks. Specifically, based on the transformer, we introduce a segment-token aligned encoder to deal with the entity labeling and entity linking tasks at different levels of granularity. Moreover, we design a novel pre-training strategy with three self-supervised tasks to learn a richer representation. StrucTexT uses the existing Masked Visual Language Modeling task and the new Sentence Length Prediction and Paired Boxes Direction tasks to incorporate the multi-modal information across text, image, and layout. We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts with significantly superior performance on the FUNSD, SROIE, and EPHOIE datasets.
An ultra-compact one-dimensional topological photonic crystal (1D-TPC) is designed in a single mode silicon bus-waveguide to generate Fano resonance lineshape. The Fano resonance comes from the interference between the discrete topological boundary s tate of the 1D-TPC and the continuum high-order leaky mode of the bus-waveguide. Standalone asymmetric Fano resonance lineshapes are obtained experimentally in the waveguide transmission spectrum with a maximum extinction ratio of 33 dB and a slope ratio of 10 dB/nm over a broadband flat background.
In magic angle twisted bilayer graphene (MATBG), the moire superlattice potential gives rise to narrow electronic bands1 which support a multitude of many-body quantum phases. Further richness arises in the presence of a perpendicular magnetic field, where the interplay between moire and magnetic length scales leads to fractal Hofstadter subbands. In this strongly correlated Hofstadter platform, multiple experiments have identified gapped topological and correlated states, but little is known about the phase transitions between them in the intervening compressible regimes. Here, using a scanning single-electron transistor microscope to measure local electronic compressibility, we simultaneously unveil novel sequences of broken-symmetry Chern insulators (CIs) and resolve sharp phase transitions between competing states with different topological quantum numbers and spin/valley flavor occupations. Our measurements provide a complete experimental mapping of the energy spectrum and thermodynamic phase diagram of interacting Hofstadter subbands in MATBG. In addition, we observe full lifting of the degeneracy of the zeroth Landau levels (zLLs) together with level crossings, indicating moire valley splitting. We propose a unified flavor polarization mechanism to understand the intricate interplay of topology, interactions, and symmetry breaking as a function of density and applied magnetic field in this system.
This paper studies the PML method for wave scattering in a half space of homogeneous medium bounded by a two-dimensional, perfectly conducting, and locally defected periodic surface, and develops a high-accuracy boundary-integral-equation (BIE) solve r. Along the vertical direction, we place a PML to truncate the unbounded domain onto a strip and prove that the PML solution converges linearly to the true solution in the physical subregion of the strip with the PML thickness. Laterally, we divide the unbounded strip into three regions: a region containing the defect and two semi-waveguide regions, separated by two vertical line segments. In both semi-waveguides, we prove the well-posedness of an associated scattering problem so as to well define a Neumann-to-Dirichlet (NtD) operator on the associated vertical segment. The two NtD operators, serving as exact lateral boundary conditions, reformulate the unbounded strip problem as a boundary value problem onto the defected region. Due to the periodicity of the semi-waveguides, both NtD operators turn out to be closely related to a Neumann-marching operator, governed by a nonlinear Riccati equation. It is proved that the Neumann-marching operators are contracting, so that the PML solution decays exponentially fast along both lateral directions. The consequences culminate in two opposite aspects. Negatively, the PML solution cannot exponentially converge to the true solution in the whole physical region of the strip. Positively, from a numerical perspective, the Riccati equations can now be efficiently solved by a recursive doubling procedure and a high-accuracy PML-based BIE method so that the boundary value problem on the defected region can be solved efficiently and accurately. Numerical experiments demonstrate that the PML solution converges exponentially fast to the true solution in any compact subdomain of the strip.
Non-parallel training is a difficult but essential task for DNN-based speech enhancement methods, for the lack of adequate noisy and paired clean speech corpus in many real scenarios. In this paper, we propose a novel adaptive attention-in-attention CycleGAN (AIA-CycleGAN) for non-parallel speech enhancement. In previous CycleGAN-based non-parallel speech enhancement methods, the limited mapping ability of the generator may cause performance degradation and insufficient feature learning. To alleviate this degradation, we propose an integration of adaptive time-frequency attention (ATFA) and adaptive hierarchical attention (AHA) to form an attention-in-attention (AIA) module for more flexible feature learning during the mapping procedure. More specifically, ATFA can capture the long-range temporal-spectral contextual information for more effective feature representations, while AHA can flexibly aggregate different AFTAs intermediate output feature maps by adaptive attention weights depending on the global context. Numerous experimental results demonstrate that the proposed approach achieves consistently more superior performance over previous GAN-based and CycleGAN-based methods in non-parallel training. Moreover, experiments in parallel training verify that the proposed AIA-CycleGAN also outperforms most advanced GAN-based and Non-GAN based speech enhancement approaches, especially in maintaining speech integrity and reducing speech distortion.
Image inpainting aims to complete the missing or corrupted regions of images with realistic contents. The prevalent approaches adopt a hybrid objective of reconstruction and perceptual quality by using generative adversarial networks. However, the re construction loss and adversarial loss focus on synthesizing contents of different frequencies and simply applying them together often leads to inter-frequency conflicts and compromised inpainting. This paper presents WaveFill, a wavelet-based inpainting network that decomposes images into multiple frequency bands and fills the missing regions in each frequency band separately and explicitly. WaveFill decomposes images by using discrete wavelet transform (DWT) that preserves spatial information naturally. It applies L1 reconstruction loss to the decomposed low-frequency bands and adversarial loss to high-frequency bands, hence effectively mitigate inter-frequency conflicts while completing images in spatial domain. To address the inpainting inconsistency in different frequency bands and fuse features with distinct statistics, we design a novel normalization scheme that aligns and fuses the multi-frequency features effectively. Extensive experiments over multiple datasets show that WaveFill achieves superior image inpainting qualitatively and quantitatively.
Generative adversarial networks (GANs) have achieved great success in image translation and manipulation. However, high-fidelity image generation with faithful style control remains a grand challenge in computer vision. This paper presents a versatil e image translation and manipulation framework that achieves accurate semantic and style guidance in image generation by explicitly building a correspondence. To handle the quadratic complexity incurred by building the dense correspondences, we introduce a bi-level feature alignment strategy that adopts a top-$k$ operation to rank block-wise features followed by dense attention between block features which reduces memory cost substantially. As the top-$k$ operation involves index swapping which precludes the gradient propagation, we propose to approximate the non-differentiable top-$k$ operation with a regularized earth movers problem so that its gradient can be effectively back-propagated. In addition, we design a novel semantic position encoding mechanism that builds up coordinate for each individual semantic region to preserve texture structures while building correspondences. Further, we design a novel confidence feature injection module which mitigates mismatch problem by fusing features adaptively according to the reliability of built correspondences. Extensive experiments show that our method achieves superior performance qualitatively and quantitatively as compared with the state-of-the-art. The code is available at href{https://github.com/fnzhan/RABIT}{https://github.com/fnzhan/RABIT}.
Despite the great success of GANs in images translation with different conditioned inputs such as semantic segmentation and edge maps, generating high-fidelity realistic images with reference styles remains a grand challenge in conditional image-to-i mage translation. This paper presents a general image translation framework that incorporates optimal transport for feature alignment between conditional inputs and style exemplars in image translation. The introduction of optimal transport mitigates the constraint of many-to-one feature matching significantly while building up accurate semantic correspondences between conditional inputs and exemplars. We design a novel unbalanced optimal transport to address the transport between features with deviational distributions which exists widely between conditional inputs and exemplars. In addition, we design a semantic-activation normalization scheme that injects style features of exemplars into the image translation process successfully. Extensive experiments over multiple image translation tasks show that our method achieves superior image translation qualitatively and quantitatively as compared with the state-of-the-art.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا