ترغب بنشر مسار تعليمي؟ اضغط هنا

101 - Xinwei He , Silin Cheng , Song Bai 2021
Learning 3D representations by fusing point cloud and multi-view data has been proven to be fairly effective. While prior works typically focus on exploiting global features of the two modalities, in this paper we argue that more discriminative featu res can be derived by modeling where to fuse. To investigate this, we propose a novel Correspondence-Aware Point-view Fusion Net (CAPNet). The core element of CAP-Net is a module named Correspondence-Aware Fusion (CAF) which integrates the local features of the two modalities based on their correspondence scores. We further propose to filter out correspondence scores with low values to obtain salient local correspondences, which reduces redundancy for the fusion process. In our CAP-Net, we utilize the CAF modules to fuse the multi-scale features of the two modalities both bidirectionally and hierarchically in order to obtain more informative features. Comprehensive evaluations on popular 3D shape benchmarks covering 3D object classification and retrieval show the superiority of the proposed framework.
Adversarial attacks are feasible in the real world for object detection. However, most of the previous works have tried to learn patches applied to an object to fool detectors, which become less effective or even ineffective in squint view angles. To address this issue, we propose the Dense Proposals Attack (DPA) to learn robust, physical and targeted adversarial camouflages for detectors. The camouflages are robust because they remain adversarial when filmed under arbitrary viewpoint and different illumination conditions, physical because they function well both in the 3D virtual scene and the real world, and targeted because they can cause detectors to misidentify an object as a specific target class. In order to make the generated camouflages robust in the physical world, we introduce a combination of viewpoint shifts, lighting and other natural transformations to model the physical phenomena. In addition, to improve the attacks, DPA substantially attacks all the classifications in the fixed region proposals. Moreover, we build a virtual 3D scene using the Unity simulation engine to fairly and reproducibly evaluate different physical attacks. Extensive experiments demonstrate that DPA outperforms the state-of-the-art methods significantly, and generalizes well to the real world, posing a potential threat to the security-critical computer vision systems.
65 - WenLin Chen , Fang Lu , Yan Dong 2021
Sparse random linear network coding (SRLNC) used as a class of erasure codes to ensure the reliability of multicast communications has been widely investigated. However, an exact expression for the decoding success probability of SRLNC is still unkno wn, and existing expressions are either asymptotic or approximate. In this paper, we derive an exact expression for the decoding success probability of SRLNC. The key to achieving this is to propose a criterion that a vector is contained in a subspace. To obtain this criterion, we construct a basis of a subspace, with respect to this basis, the coordinates of a vector are known, based on a maximal linearly independent set of the columns of a matrix. The exactness and the computation of the derived expression are demonstrated by a simple example.
In recent decades, many studies have suggested that phase information is crucial for speech enhancement (SE), and time-domain single-channel speech enhancement techniques have shown promise in noise suppression and robust automatic speech recognition (ASR). This paper presents a continuation of the above lines of research and explores two effective SE methods that consider phase information in time domain and frequency domain of speech signals, respectively. Going one step further, we put forward a novel cross-domain speech enhancement model and a bi-projection fusion (BPF) mechanism for noise-robust ASR. To evaluate the effectiveness of our proposed method, we conduct an extensive set of experiments on the publicly-available Aishell-1 Mandarin benchmark speech corpus. The evaluation results confirm the superiority of our proposed method in relation to a few current top-of-the-line time-domain and frequency-domain SE methods in both enhancement and ASR evaluation metrics for the test set of scenarios contaminated with seen and unseen noise, respectively.
Segmentation of images is a long-standing challenge in medical AI. This is mainly due to the fact that training a neural network to perform image segmentation requires a significant number of pixel-level annotated data, which is often unavailable. To address this issue, we propose a semi-supervised image segmentation technique based on the concept of multi-view learning. In contrast to the previous art, we introduce an adversarial form of dual-view training and employ a critic to formulate the learning problem in multi-view training as a min-max problem. Thorough quantitative and qualitative evaluations on several datasets indicate that our proposed method outperforms state-of-the-art medical image segmentation algorithms consistently and comfortably. The code is publicly available at https://github.com/himashi92/Duo-SegNet
144 - Ning Ding , Yulin Chen , Xu Han 2021
As an effective approach to tune pre-trained language models (PLMs) for specific tasks, prompt-learning has recently attracted much attention from researchers. By using textit{cloze}-style language prompts to stimulate the versatile knowledge of PLMs , prompt-learning can achieve promising results on a series of NLP tasks, such as natural language inference, sentiment classification, and knowledge probing. In this work, we investigate the application of prompt-learning on fine-grained entity typing in fully supervised, few-shot and zero-shot scenarios. We first develop a simple and effective prompt-learning pipeline by constructing entity-oriented verbalizers and templates and conducting masked language modeling. Further, to tackle the zero-shot regime, we propose a self-supervised strategy that carries out distribution-level optimization in prompt-learning to automatically summarize the information of entity types. Extensive experiments on three fine-grained entity typing benchmarks (with up to 86 classes) under fully supervised, few-shot and zero-shot settings show that prompt-learning methods significantly outperform fine-tuning baselines, especially when the training data is insufficient.
139 - Fulin Chen , Shaobin Tan , Nina Yu 2021
For any nullity $2$ extended affine Lie algebra $mathcal{E}$ of maximal type and $ellinmathbb{C}$, we prove that there exist a vertex algebra $V_{mathcal{E}}(ell)$ and an automorphism group $G$ of $V_{mathcal{E}}(ell)$ equipped with a linear characte r $chi$, such that the category of restricted $mathcal{E}$-modules of level $ell$ is canonically isomorphic to the category of $(G,chi)$-equivariant $phi$-coordinated quasi $V_{mathcal{E}}(ell)$-modules. Moreover, when $ell$ is a nonnegative integer, there is a quotient vertex algebra $L_{mathcal{E}}(ell)$ of $V_{mathcal{E}}(ell)$ modulo by a $G$-stable ideal, and we prove that the integrable restricted $mathcal{E}$-modules of level $ell$ are exactly the $(G,chi)$-equivariant $phi$-coordinated quasi $L_{mathcal{E}}(ell)$-modules.
The task of skeleton-based action recognition remains a core challenge in human-centred scene understanding due to the multiple granularities and large variation in human motion. Existing approaches typically employ a single neural representation for different motion patterns, which has difficulty in capturing fine-grained action classes given limited training data. To address the aforementioned problems, we propose a novel multi-granular spatio-temporal graph network for skeleton-based action classification that jointly models the coarse- and fine-grained skeleton motion patterns. To this end, we develop a dual-head graph network consisting of two interleaved branches, which enables us to extract features at two spatio-temporal resolutions in an effective and efficient manner. Moreover, our network utilises a cross-head communication strategy to mutually enhance the representations of both heads. We conducted extensive experiments on three large-scale datasets, namely NTU RGB+D 60, NTU RGB+D 120, and Kinetics-Skeleton, and achieves the state-of-the-art performance on all the benchmarks, which validates the effectiveness of our method.
Reversible data hiding in encrypted images is an eff ective technique for data hiding and preserving image privacy. In this paper, we propose a novel schema based on polynomial arithmetic, which achieves a high embedding capacity with the perfect rec overy of the original image. An effi cient two-layer symmetric en- cryption method is applied to protect the privacy of the original image. One polynomial is generated by the encryption key and a group of the encrypted pixel, and the secret data is mapped into another polynomial. Through the arithmetic of these two polynomials, the purpose of this work is achieved. Fur- thermore, pixel value mapping is designed to reduce the size of auxiliary data, which can further improve embedding capacity. Experimental results demon- strate that our solution has a stable and good performance on various images. Compared with some state-of-the-art methods, the proposed method can get better decrypted image quality with a large embedding capacity.
A large unidirectional magnetoresistance (UMR) ratio of UMR/$R_{xx}sim$ $0.36%$ is found in W/CoFeB metallic bilayer heterostructures at room temperature. Three different regimes in terms of the current dependence of UMR ratio are identified: A spin- dependent-scattering mechanism regime at small current densities $J sim$ $10$$^{9}$A/m$^{2}$ (UMR ratio $propto$ $J$), a spin-magnon-interaction mechanism regime at intermediate $J sim$ $10$$^{10}$A/m$^{2}$ (UMR ratio $propto$ $J$$^{3}$), and a spin-transfer torque (STT) regime at $J sim$ $10$$^{11}$A/m$^{2}$ (UMR ratio independent of $J$). We verify the direct correlation between this large UMR and the transfer of spin angular momentum from the W layer to the CoFeB layer by both field-dependent and current-dependent UMR characterizations. Numerical simulations further confirm that the large STT-UMR stems from the tilting of the magnetization affected by the spin Hall effect-induced spin-transfer torques. An alternative approach to estimate damping-like spin-torque efficiencies from magnetic heterostructures is also proposed.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا