أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Jie Wu

Everything Is All It Takes: A Multipronged Strategy for Zero-Shot Cross-Lingual Information Extraction

87 - Mahsa Yarmohammadi , Shijie Wu , Marc Marone 2021

Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English. While the advance of pretrained multilingual enc oders suggests an easy optimism of train on English, run on any language, we find through a thorough exploration and extension of techniques that a combination of approaches, both new and old, leads to better performance than any one cross-lingual strategy in particular. We explore techniques including data projection and self-training, and how different pretrained encoders impact them. We use English-to-Arabic IE as our initial example, demonstrating strong performance in this setting for event extraction, named entity recognition, part-of-speech tagging, and dependency parsing. We then apply data projection and self-training to three tasks across eight target languages. Because no single set of techniques performs the best across all tasks, we encourage practitioners to explore various configurations of the techniques described in this work when seeking to improve on zero-shot training.

الحساب واللغة

On some sums involving the integral part function

108 - Kui Liu , Jie Wu , Zhishan Yang 2021

Denote by $tau$ k (n), $omega$(n) and $mu$ 2 (n) the number of representations of n as product of k natural numbers, the number of distinct prime factors of n and the characteristic function of the square-free integers, respectively. Let [t] be the i ntegral part of real number t. For f = $omega$, 2 $omega$ , $mu$ 2 , $tau$ k , we prove that n x f x n = x d 1 f (d) d(d + 1) + O $epsilon$ (x $theta$ f +$epsilon$) for x $rightarrow$ $infty$, where $theta$ $omega$ = 53 110 , $theta$ 2 $omega$ = 9 19 , $theta$ $mu$2 = 2 5 , $theta$ $tau$ k = 5k--1 10k--1 and $epsilon$ > 0 is an arbitrarily small positive number. These improve the corresponding results of Bordell{`e}s.

نظرية الأعداد

PAENet: A Progressive Attention-Enhanced Network for 3D to 2D Retinal Vessel Segmentation

120 - Zhuojie Wu , Muyi Sun 2021

3D to 2D retinal vessel segmentation is a challenging problem in Optical Coherence Tomography Angiography (OCTA) images. Accurate retinal vessel segmentation is important for the diagnosis and prevention of ophthalmic diseases. However, making full u se of the 3D data of OCTA volumes is a vital factor for obtaining satisfactory segmentation results. In this paper, we propose a Progressive Attention-Enhanced Network (PAENet) based on attention mechanisms to extract rich feature representation. Specifically, the framework consists of two main parts, the three-dimensional feature learning path and the two-dimensional segmentation path. In the three-dimensional feature learning path, we design a novel Adaptive Pooling Module (APM) and propose a new Quadruple Attention Module (QAM). The APM captures dependencies along the projection direction of volumes and learns a series of pooling coefficients for feature fusion, which efficiently reduces feature dimension. In addition, the QAM reweights the features by capturing four-group cross-dimension dependencies, which makes maximum use of 4D feature tensors. In the two-dimensional segmentation path, to acquire more detailed information, we propose a Feature Fusion Module (FFM) to inject 3D information into the 2D path. Meanwhile, we adopt the Polarized Self-Attention (PSA) block to model the semantic interdependencies in spatial and channel dimensions respectively. Experimentally, our extensive experiments on the OCTA-500 dataset show that our proposed algorithm achieves state-of-the-art performance compared with previous methods.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Normal Learning in Videos with Attention Prototype Network

306 - Chao Hu , Fan Wu , Weijie Wu 2021

Frame reconstruction (current or future frame) based on Auto-Encoder (AE) is a popular method for video anomaly detection. With models trained on the normal data, the reconstruction errors of anomalous scenes are usually much larger than those of nor mal ones. Previous methods introduced the memory bank into AE, for encoding diverse normal patterns across the training videos. However, they are memory consuming and cannot cope with unseen new scenarios in the testing data. In this work, we propose a self-attention prototype unit (APU) to encode the normal latent space as prototypes in real time, free from extra memory cost. In addition, we introduce circulative attention mechanism to our backbone to form a novel feature extracting learner, namely Circulative Attention Unit (CAU). It enables the fast adaption capability on new scenes by only consuming a few iterations of update. Extensive experiments are conducted on various benchmarks. The superior performance over the state-of-the-art demonstrates the effectiveness of our method. Our code is available at https://github.com/huchao-AI/APN/.

الرؤية الحاسوبية وتمييز الأنماط

Online Multi-Granularity Distillation for GAN Compression

145 - Yuxi Ren , Jie Wu , Xuefeng Xiao 2021

Generative Adversarial Networks (GANs) have witnessed prevailing success in yielding outstanding images, however, they are burdensome to deploy on resource-constrained devices due to ponderous computational costs and hulking memory usage. Although re cent efforts on compressing GANs have acquired remarkable results, they still exist potential model redundancies and can be further compressed. To solve this issue, we propose a novel online multi-granularity distillation (OMGD) scheme to obtain lightweight GANs, which contributes to generating high-fidelity images with low computational demands. We offer the first attempt to popularize single-stage online distillation for GAN-oriented compression, where the progressively promoted teacher generator helps to refine the discriminator-free based student generator. Complementary teacher generators and network layers provide comprehensive and multi-granularity concepts to enhance visual fidelity from diverse dimensions. Experimental results on four benchmark datasets demonstrate that OMGD successes to compress 40x MACs and 82.5X parameters on Pix2Pix and CycleGAN, without loss of image quality. It reveals that OMGD provides a feasible solution for the deployment of real-time image translation on resource-constrained devices. Our code and models are made public at: https://github.com/bytedance/OMGD.

الرؤية الحاسوبية وتمييز الأنماط

Classification of solutions of the 2D steady Navier-Stokes equations with separated variables in cone-like domains

133 - Wendong Wang , Jie Wu 2021

We investigate the problem of classification of solutions for the steady Navier-Stokes equations in any cone-like domains. In the form of separated variables, $$u(x,y)=left( begin{array}{c} varphi_1(r)v_1(theta) varphi_2(r)v_2(theta) end{arra y} right) ,$$ where $x=rcostheta$ and $y=rsintheta$ in polar coordinates, we obtain the expressions of all smooth solutions with $C^0$ Dirichlet boundary condition. In particular, it shows that (i) some solutions are found, which are H{o}lder continuous on the boundary, but their gradients blow up at the corner; (ii) all solutions in the entire plane of $mathbb{R}^2$ like harmonic functions or Stokes equations, are polynomial expressions.

تحليل PDES

Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance Video

162 - Jie Wu , Wei Zhang , Guanbin Li 2021

In this paper, we introduce a novel task, referred to as Weakly-Supervised Spatio-Temporal Anomaly Detection (WSSTAD) in surveillance video. Specifically, given an untrimmed video, WSSTAD aims to localize a spatio-temporal tube (i.e., a sequence of b ounding boxes at consecutive times) that encloses the abnormal event, with only coarse video-level annotations as supervision during training. To address this challenging task, we propose a dual-branch network which takes as input the proposals with multi-granularities in both spatial-temporal domains. Each branch employs a relationship reasoning module to capture the correlation between tubes/videolets, which can provide rich contextual information and complex entity relationships for the concept learning of abnormal behaviors. Mutually-guided Progressive Refinement framework is set up to employ dual-path mutual guidance in a recurrent manner, iteratively sharing auxiliary supervision information across branches. It impels the learned concepts of each branch to serve as a guide for its counterpart, which progressively refines the corresponding branch and the whole framework. Furthermore, we contribute two datasets, i.e., ST-UCF-Crime and STRA, consisting of videos containing spatio-temporal abnormal annotations to serve as the benchmarks for WSSTAD. We conduct extensive qualitative and quantitative evaluations to demonstrate the effectiveness of the proposed approach and analyze the key factors that contribute more to handle this task.

الرؤية الحاسوبية وتمييز الأنماط

Exploiting Spiking Dynamics with Spatial-temporal Feature Normalization in Graph Learning

44 - Mingkun Xu , Yujie Wu , Lei Deng 2021

Biological spiking neurons with intrinsic dynamics underlie the powerful representation and learning capabilities of the brain for processing multimodal information in complex environments. Despite recent tremendous progress in spiking neural network s (SNNs) for handling Euclidean-space tasks, it still remains challenging to exploit SNNs in processing non-Euclidean-space data represented by graph data, mainly due to the lack of effective modeling framework and useful training techniques. Here we present a general spike-based modeling framework that enables the direct training of SNNs for graph learning. Through spatial-temporal unfolding for spiking data flows of node features, we incorporate graph convolution filters into spiking dynamics and formalize a synergistic learning paradigm. Considering the unique features of spike representation and spiking dynamics, we propose a spatial-temporal feature normalization (STFN) technique suitable for SNN to accelerate convergence. We instantiate our methods into two spiking graph models, including graph convolution SNNs and graph attention SNNs, and validate their performance on three node-classification benchmarks, including Cora, Citeseer, and Pubmed. Our model can achieve comparable performance with the state-of-the-art graph neural network (GNN) models with much lower computation costs, demonstrating great benefits for the execution on neuromorphic hardware and prompting neuromorphic applications in graphical scenarios.

الحوسبة العصبية والتطورية التعلم الآلي

Approach the Fundamental Limit of Orbital Angular Momentum Multiplexing

155 - Shuai S. A. Yuan , Jie Wu , Menglin L. N. Chen 2021

Establishing and approaching the fundamental limit of orbital angular momentum (OAM) multiplexing are paramountly important and increasingly urgent for current multiple-input multiple-output research. In this work, we elaborate the fundamental limit in terms of independent scattering channels (or degrees of freedom of scattered fields) through angular-spectral analysis, in conjunction with a transformation of basis. The scattering channel limit is universal for arbitrary spatial mode multiplexing, which is launched by a planar electromagnetic device, such as antenna, metasurface, etc, with a predefined physical size. As a proof of concept, we demonstrate both theoretically and experimentally the limit by a metasurface hologram that transforms orthogonal OAM modes to plane-wave modes scattered at critically separated angular-spectral regions. Particularly, a minimax optimization algorithm is applied to suppress angular spectrum aliasing, achieving good performances in both full-wave simulation and experimental measurement at microwave frequencies. This work offers a theoretical upper bound and corresponding approach route for engineering designs of OAM multiplexing.

الفيزياء التطبيقية بصريات

FloorPP-Net: Reconstructing Floor Plans using Point Pillars for Scan-to-BIM

37 - Yijie Wu , Fan Xue 2021

This paper presents a deep learning-based point cloud processing method named FloorPP-Net for the task of Scan-to-BIM (building information model). FloorPP-Net first converts the input point cloud of a building story into point pillars (PP), then pre dicts the corners and edges to output the floor plan. Altogether, FloorPP-Net establishes an end-to-end supervised learning framework for the Scan-to-Floor-Plan (Scan2FP) task. In the 1st International Scan-to-BIM Challenge held in conjunction with CVPR 2021, FloorPP-Net was ranked the second runner-up in the floor plan reconstruction track. Future work includes general edge proposals, 2D plan regularization, and 3D BIM reconstruction.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد