أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Kai Sun

PlaTe: Visually-Grounded Planning with Transformers in Procedural Tasks

90 - Jiankai Sun , De-An Huang , Bo Lu 2021

In this work, we study the problem of how to leverage instructional videos to facilitate the understanding of human decision-making processes, focusing on training a model with the ability to plan a goal-directed procedure from real-world videos. Lea rning structured and plannable state and action spaces directly from unstructured videos is the key technical challenge of our task. There are two problems: first, the appearance gap between the training and validation datasets could be large for unstructured videos; second, these gaps lead to decision errors that compound over the steps. We address these limitations with Planning Transformer (PlaTe), which has the advantage of circumventing the compounding prediction errors that occur with single-step models during long model-based rollouts. Our method simultaneously learns the latent state and action information of assigned tasks and the representations of the decision-making process from human demonstrations. Experiments conducted on real-world instructional videos and an interactive environment show that our method can achieve a better performance in reaching the indicated goal than previous algorithms. We also validated the possibility of applying procedural tasks on a UR-5 platform.

علم الروبوتات

Partial Symbol Recovery for Interference Resilience in Low-Power Wide Area Networks

137 - Kai Sun , Zhimeng Yin , Weiwei Chen 2021

Recent years have witnessed the proliferation of Low-power Wide Area Networks (LPWANs) in the unlicensed band for various Internet-of-Things (IoT) applications. Due to the ultra-low transmission power and long transmission duration, LPWAN devices ine vitably suffer from high power Cross Technology Interference (CTI), such as interference from Wi-Fi, coexisting in the same spectrum. To alleviate this issue, this paper introduces the Partial Symbol Recovery (PSR) scheme for improving the CTI resilience of LPWAN. We verify our idea on LoRa, a widely adopted LPWAN technique, as a proof of concept. At the PHY layer, although CTI has much higher power, its duration is relatively shorter compared with LoRa symbols, leaving part of a LoRa symbol uncorrupted. Moreover, due to its high redundancy, LoRa chips within a symbol are highly correlated. This opens the possibility of detecting a LoRa symbol with only part of the chips. By examining the unique frequency patterns in LoRa symbols with time-frequency analysis, our design effectively detects the clean LoRa chips that are free of CTI. This enables PSR to only rely on clean LoRa chips for successfully recovering from communication failures. We evaluate our PSR design with real-world testbeds, including SX1280 LoRa chips and USRP B210, under Wi-Fi interference in various scenarios. Extensive experiments demonstrate that our design offers reliable packet recovery performance, successfully boosting the LoRa packet reception ratio from 45.2% to 82.2% with a performance gain of 1.8 times.

بنية الشبكات والإنترنت معالجة الإشارات

An Empirical Study on Leveraging Position Embeddings for Target-oriented Opinion Words Extraction

89 - Samuel Mensah , Kai Sun , Nikolaos Aletras 2021

Target-oriented opinion words extraction (TOWE) (Fan et al., 2019b) is a new subtask of target-oriented sentiment analysis that aims to extract opinion words for a given aspect in text. Current state-of-the-art methods leverage position embeddings to capture the relative position of a word to the target. However, the performance of these methods depends on the ability to incorporate this information into word representations. In this paper, we explore a variety of text encoders based on pretrained word embeddings or language models that leverage part-of-speech and position embeddings, aiming to examine the actual contribution of each component in TOWE. We also adapt a graph convolutional network (GCN) to enhance word representations by incorporating syntactic information. Our experimental results demonstrate that BiLSTM-based models can effectively encode position information into word representations while using a GCN only achieves marginal gains. Interestingly, our simple methods outperform several state-of-the-art complex neural structures.

الحساب واللغة الذكاء الاصطناعي

Fine-Grained Chemical Entity Typing with Multimodal Knowledge Representation

325 - Chenkai Sun , Weijiang Li , Jinfeng Xiao 2021

Automated knowledge discovery from trending chemical literature is essential for more efficient biomedical research. How to extract detailed knowledge about chemical reactions from the core chemistry literature is a new emerging challenge that has no t been well studied. In this paper, we study the new problem of fine-grained chemical entity typing, which poses interesting new challenges especially because of the complex name mentions frequently occurring in chemistry literature and graphic representation of entities. We introduce a new benchmark data set (CHEMET) to facilitate the study of the new task and propose a novel multi-modal representation learning framework to solve the problem of fine-grained chemical entity typing by leveraging external resources with chemical structures and using cross-modal attention to learn effective representation of text in the chemistry domain. Experiment results show that the proposed framework outperforms multiple state-of-the-art methods.

الحساب واللغة التعلم الآلي

A topological Dirac-vortex parametric phonon laser

305 - Xiang Xi , Jingwen Ma , Xiankai Sun 2021

Nonlinear topological photonic and phononic systems have recently aroused intense interests in exploring new phenomena that have no counterparts in electronic systems. The squeezed bosonic interaction in these systems is particularly interesting, bec ause it can modify the vacuum fluctuations of topological states, drive them into instabilities, and lead to topological parametric lasers. However, these phenomena remain experimentally elusive because of limited nonlinearities in most existing topological bosonic systems. Here, we experimentally realized topological parametric lasers based on nonlinear nanoelectromechanical Dirac-vortex cavities with strong squeezed interaction. Specifically, we parametrically drove the Dirac-vortex cavities to provide phase-sensitive amplification for topological phonons, and observed phonon lasing above the threshold. Additionally, we confirmed that the lasing frequency is robust against fabrication disorders and that the free spectral range defies the universal inverse scaling law with increased cavity size, which benefit the realization of large-area single-mode lasers. Our results represent an important advance in experimental investigations of topological physics with large bosonic nonlinearities and parametric gain.

الفيزياء ميسكالي وننكالي الفيزياء التطبيقية بصريات

Defending against Reconstruction Attack in Vertical Federated Learning

110 - Jiankai Sun , Yuanshun Yao , Weihao Gao 2021

Recently researchers have studied input leakage problems in Federated Learning (FL) where a malicious party can reconstruct sensitive training inputs provided by users from shared gradient. It raises concerns about FL since input leakage contradicts the privacy-preserving intention of using FL. Despite a relatively rich literature on attacks and defenses of input reconstruction in Horizontal FL, input leakage and protection in vertical FL starts to draw researchers attention recently. In this paper, we study how to defend against input leakage attacks in Vertical FL. We design an adversarial training-based framework that contains three modules: adversarial reconstruction, noise regularization, and distance correlation minimization. Those modules can not only be employed individually but also applied together since they are independent to each other. Through extensive experiments on a large-scale industrial online advertising dataset, we show our framework is effective in protecting input privacy while retaining the model utility.

التعلم الآلي التشفير والأمن

Vertical Federated Learning without Revealing Intersection Membership

117 - Jiankai Sun , Xin Yang , Yuanshun Yao 2021

Vertical Federated Learning (vFL) allows multiple parties that own different attributes (e.g. features and labels) of the same data entity (e.g. a person) to jointly train a model. To prepare the training data, vFL needs to identify the common data e ntities shared by all parties. It is usually achieved by Private Set Intersection (PSI) which identifies the intersection of training samples from all parties by using personal identifiable information (e.g. email) as sample IDs to align data instances. As a result, PSI would make sample IDs of the intersection visible to all parties, and therefore each party can know that the data entities shown in the intersection also appear in the other parties, i.e. intersection membership. However, in many real-world privacy-sensitive organizations, e.g. banks and hospitals, revealing membership of their data entities is prohibited. In this paper, we propose a vFL framework based on Private Set Union (PSU) that allows each party to keep sensitive membership information to itself. Instead of identifying the intersection of all training samples, our PSU protocol generates the union of samples as training instances. In addition, we propose strategies to generate synthetic features and labels to handle samples that belong to the union but not the intersection. Through extensive experiments on two real-world datasets, we show our framework can protect the privacy of the intersection membership while maintaining the model utility.

التعلم الآلي الذكاء الاصطناعي

Multi-scale super-resolution generation of low-resolution scanned pathological images

95 - Kai Sun 2021

Background. Digital pathology has aroused widespread interest in modern pathology. The key of digitalization is to scan the whole slide image (WSI) at high magnification. The lager the magnification is, the richer details WSI will provide, but the sc anning time is longer and the file size of obtained is larger. Methods. We design a strategy to scan slides with low resolution (5X) and a super-resolution method is proposed to restore the image details when in diagnosis. The method is based on a multi-scale generative adversarial network, which sequentially generates three high-resolution images such as 10X, 20X and 40X. Results. The peak-signal-to-noise-ratio of 10X to 40X generated images are 24.16, 22.27 and 20.44, and the structural-similarity-index are 0.845, 0.680 and 0.512, which are better than other super-resolution networks. Visual scoring average and standard deviation from three pathologists is 3.63 plus-minus 0.52, 3.70 plus-minus 0.57 and 3.74 plus-minus 0.56 and the p value of analysis of variance is 0.37, indicating that generated images include sufficient information for diagnosis. The average value of Kappa test is 0.99, meaning the diagnosis of generated images is highly consistent with that of the real images. Conclusion. This proposed method can generate high-quality 10X, 20X, 40X images from 5X images at the same time, in which the time and storage costs of digitalization can be effectively reduced up to 1/64 of the previous costs. The proposed method provides a better alternative for low-cost storage, faster image share of digital pathology. Keywords. Digital pathology; Super-resolution; Low resolution scanning; Low cost

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Demonstrating shareability of multipartite Einstein-Podolsky-Rosen steering

98 - Ze-Yan Hao , Kai Sun , Yan Wang 2021

The Einstein-Podolsky-Rosen (EPR) steering, which is regarded as a category of quantum nonlocal correlations, owns the asymmetric property in contrast with the entanglement and the Bell nonlocality. For the multipartite EPR steering, monogamy, which limits the two observers to steer the third one simultaneously, emerges as an essential property. However, more configurations of shareability relations in the reduced subsystem which are beyond the monogamy could be observed by increasing the numbers of measurement setting, in which the experimental verification is still absent. Here, in an optical experiment, we provide a proof-of-principle demonstration of shareability of the EPR steering without constraint of monogamy in the three-qubit system, in which Alice could be steered by Bob and Charlie simultaneously. Moreover, based on the reduced bipartite EPR steering detection, we verify the genuine three-qubit entanglement. This work provides a basis for an improved understanding of the multipartite EPR steering and has potential applications in many quantum information protocols, such as multipartite entanglement detection and quantum cryptography.

فيزياء الكم

Experimental quantum phase discrimination enhanced by controllable indistinguishability-based coherence

89 - Kai Sun , Zheng-Hao Liu , Yan Wang 2021

Quantum coherence, a basic feature of quantum mechanics residing in superpositions of quantum states, is a resource for quantum information processing. Coherence emerges in a fundamentally different way for nonidentical and identical particles, in th at for the latter a unique contribution exists linked to indistinguishability which cannot occur for nonidentical particles. We experimentally demonstrate by an optical setup this additional contribution to quantum coherence, showing that its amount directly depends on the degree of indistinguishability and exploiting it to run a quantum phase discrimination protocol. Furthermore, the designed setup allows for simulating Fermionic particles with photons, thus assessing the role of particle statistics (Bosons or Fermions) in coherence generation and utilization. Our experiment proves that independent indistinguishable particles can supply a controllable resource of coherence for quantum metrology.

فيزياء الكم

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد