أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Xing Wu

ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding

136 - Xing Wu , Chaochen Gao , Liangjun Zang 2021

Contrastive learning has been attracting much attention for learning unsupervised sentence embeddings. The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE). Unsup-SimCSE takes dropout as a minimal data augmentati on method, and passes the same input sentence to a pre-trained Transformer encoder (with dropout turned on) twice to obtain the two corresponding embeddings to build a positive pair. As the length information of a sentence will generally be encoded into the sentence embeddings due to the usage of position embedding in Transformer, each positive pair in unsup-SimCSE actually contains the same length information. And thus unsup-SimCSE trained with these positive pairs is probably biased, which would tend to consider that sentences of the same or similar length are more similar in semantics. Through statistical observations, we find that unsup-SimCSE does have such a problem. To alleviate it, we apply a simple repetition operation to modify the input sentence, and then pass the input sentence and its modified counterpart to the pre-trained Transformer encoder, respectively, to get the positive pair. Additionally, we draw inspiration from the community of computer vision and introduce a momentum contrast, enlarging the number of negative pairs without additional calculations. The proposed two modifications are applied on positive and negative pairs separately, and build a new sentence embedding method, termed Enhanced Unsup-SimCSE (ESimCSE). We evaluate the proposed ESimCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task. Experimental results show that ESimCSE outperforms the state-of-the-art unsup-SimCSE by an average Spearman correlation of 2.02% on BERT-base.

الحساب واللغة الذكاء الاصطناعي

Smoothed Contrastive Learning for Unsupervised Sentence Embedding

351 - Xing Wu , Chaochen Gao , Liangjun Zang 2021

Contrastive learning has been gradually applied to learn high-quality unsupervised sentence embedding. Among the previous un-supervised methods, the latest state-of-the-art method, as far as we know, is unsupervised SimCSE (unsup-SimCSE). Unsup-SimCS E uses the InfoNCE1loss function in the training stage by pulling semantically similar sentences together and pushing apart dis-similar ones.Theoretically, we expect to use larger batches in unsup-SimCSE to get more adequate comparisons among samples and avoid overfitting. However, increasing the batch size does not always lead to improvements, but instead even lead to performance degradation when the batch size exceeds a threshold. Through statistical observation, we find that this is probably due to the introduction of low-confidence negative pairs after in-creasing the batch size. To alleviate this problem, we introduce a simple smoothing strategy upon the InfoNCE loss function, termedGaussian Smoothing InfoNCE (GS-InfoNCE).Specifically, we add random Gaussian noise vectors as negative samples, which act asa smoothing of the negative sample space.Though being simple, the proposed smooth-ing strategy brings substantial improvements to unsup-SimCSE. We evaluate GS-InfoNCEon the standard semantic text similarity (STS)task. GS-InfoNCE outperforms the state-of-the-art unsup-SimCSE by an average Spear-man correlation of 1.38%, 0.72%, 1.17% and0.28% on the base of BERT-base, BERT-large,RoBERTa-base and RoBERTa-large, respectively.

الحساب واللغة الذكاء الاصطناعي

HybrUR: A Hybrid Physical-Neural Solution for Unsupervised Underwater Image Restoration

131 - Shuaizheng Yan , Xingyu Chen , Zhengxing Wu 2021

Robust vision restoration for an underwater image remains a challenging problem. For the lack of aligned underwater-terrestrial image pairs, the unsupervised method is more suited to this task. However, the pure data-driven unsupervised method usuall y has difficulty in achieving realistic color correction for lack of optical constraint. In this paper, we propose a data- and physics-driven unsupervised architecture that learns underwater vision restoration from unpaired underwater-terrestrial images. For sufficient domain transformation and detail preservation, the underwater degeneration needs to be explicitly constructed based on the optically unambiguous physics law. Thus, we employ the Jaffe-McGlamery degradation theory to design the generation models, and use neural networks to describe the process of underwater degradation. Furthermore, to overcome the problem of invalid gradient when optimizing the hybrid physical-neural model, we fully investigate the intrinsic correlation between the scene depth and the degradation factors for the backscattering estimation, to improve the restoration performance through physical constraints. Our experimental results show that the proposed method is able to perform high-quality restoration for unconstrained underwater images without any supervision. On multiple benchmarks, we outperform several state-of-the-art supervised and unsupervised approaches. We also demonstrate that our methods yield encouraging results on real-world applications.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

Reveal of Domain Effect: How Visual Restoration Contributes to Object Detection in Aquatic Scenes

52 - Xingyu Chen , Yue Lu , Zhengxing Wu 2020

Underwater robotic perception usually requires visual restoration and object detection, both of which have been studied for many years. Meanwhile, data domain has a huge impact on modern data-driven leaning process. However, exactly indicating domain effect, the relation between restoration and detection remains unclear. In this paper, we generally investigate the relation of quality-diverse data domain to detection performance. In the meantime, we unveil how visual restoration contributes to object detection in real-world underwater scenes. According to our analysis, five key discoveries are reported: 1) Domain quality has an ignorable effect on within-domain convolutional representation and detection accuracy; 2) low-quality domain leads to higher generalization ability in cross-domain detection; 3) low-quality domain can hardly be well learned in a domain-mixed learning process; 4) degrading recall efficiency, restoration cannot improve within-domain detection accuracy; 5) visual restoration is beneficial to detection in the wild by reducing the domain shift between training data and real-world scenes. Finally, as an illustrative example, we successfully perform underwater object detection with an aquatic robot.

الرؤية الحاسوبية وتمييز الأنماط

Rethinking Temporal Object Detection from Robotic Perspectives

64 - Xingyu Chen , Zhengxing Wu , Junzhi Yu 2019

Video object detection (VID) has been vigorously studied for years but almost all literature adopts a static accuracy-based evaluation, i.e., average precision (AP). From a robotic perspective, the importance of recall continuity and localization sta bility is equal to that of accuracy, but the AP is insufficient to reflect detectors performance across time. In this paper, non-reference assessments are proposed for continuity and stability based on object tracklets. These temporal evaluations can serve as supplements to static AP. Further, we develop an online tracklet refinement for improving detectors temporal performance through short tracklet suppression, fragment filling, and temporal location fusion. In addition, we propose a small-overlap suppression to extend VID methods to single object tracking (SOT) task so that a flexible SOT-by-detection framework is then formed. Extensive experiments are conducted on ImageNet VID dataset and real-world robotic tasks, where the superiority of our proposed approaches are validated and verified. Codes will be publicly available.

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

The metastable Q $^3Delta_2$ state of ThO: A new resource for the ACME electron EDM search

177 - Xing Wu , Zhen Han , James Chow 2019

The best upper limit for the electron electric dipole moment was recently set by the ACME collaboration. This experiment measures an electron spin-precession in a cold beam of ThO molecules in their metastable $H~(^3Delta_1)$ state. Improvement in th e statistical and systematic uncertainties is possible with more efficient use of molecules from the source and better magnetometry in the experiment, respectively. Here, we report measurements of several relevant properties of the long-lived $Q~(^3Delta_2)$ state of ThO, and show that this state is a very useful resource for both these purposes. The $Q$ state lifetime is long enough that its decay during the time of flight in the ACME beam experiment is negligible. The large electric dipole moment measured for the $Q$ state, giving rise to a large linear Stark shift, is ideal for an electrostatic lens that increases the fraction of molecules detected downstream. The measured magnetic moment of the $Q$ state is also large enough to be used as a sensitive co-magnetometer in ACME. Finally, we show that the $Q$ state has a large transition dipole moment to the $C~(^1Pi_1)$ state, which allows for efficient population transfer between the ground state $X~(^1Sigma^+)$ and the $Q$ state via $X-C-Q$ Stimulated Raman Adiabatic Passage (STIRAP). We demonstrate $90,$% STIRAP transfer efficiency. In the course of these measurements, we also determine the magnetic moment of $C$ state, the $Xrightarrow C$ transition dipole moment, and branching ratios of decays from the $C$ state.

الفيزياء الذرية فيزياء الطاقة العالية - التجربة فيزياء الكم

TransSent: Towards Generation of Structured Sentences with Discourse Marker

78 - Xing Wu , Dongjun Wei , Liangjun Zang 2019

Structured sentences are important expressions in human writings and dialogues. Previous works on neural text generation fused semantic and structural information by encoding the entire sentence into a mixed hidden representation. However, when a gen erated sentence becomes complicated, the structure is difficult to be properly maintained. To alleviate this problem, we explicitly separate the modeling process of semantic and structural information. Intuitively, humans generate structured sentences by directly connecting discourses with discourse markers (such as and, but, etc.). Therefore, we propose a task that mimics this process, called discourse transfer. This task represents a structured sentence as (head discourse, discourse marker, tail discourse), and aims at tail discourse generation based on head discourse and discourse marker. We also propose a corresponding model called TransSent, which interprets the relationship between two discourses as a translation1 from the head discourse to the tail discourse in the embedding space. We experiment TransSent not only in discourse transfer task but also in free text generation and dialogue generation tasks. Automatic and human evaluation results show that TransSent can generate structured sentences with high quality, and has certain scalability in different tasks.

الحساب واللغة الذكاء الاصطناعي

Mask and Infill : Applying Masked Language Model to Sentiment Transfer

255 - Xing Wu , Tao Zhang , Liangjun Zang 2019

This paper focuses on the task of sentiment transfer on non-parallel text, which modifies sentiment attributes (e.g., positive or negative) of sentences while preserving their attribute-independent content. Due to the limited capability of RNNbased e ncoder-decoder structure to capture deep and long-range dependencies among words, previous works can hardly generate satisfactory sentences from scratch. When humans convert the sentiment attribute of a sentence, a simple but effective approach is to only replace the original sentimental tokens in the sentence with target sentimental expressions, instead of building a new sentence from scratch. Such a process is very similar to the task of Text Infilling or Cloze, which could be handled by a deep bidirectional Masked Language Model (e.g. BERT). So we propose a two step approach Mask and Infill. In the mask step, we separate style from content by masking the positions of sentimental tokens. In the infill step, we retrofit MLM to Attribute Conditional MLM, to infill the masked positions by predicting words or phrases conditioned on the context1 and target sentiment. We evaluate our model on two review datasets with quantitative, qualitative, and human evaluations. Experimental results demonstrate that our models improve state-of-the-art performance.

الحساب واللغة

MIN: Co-Governing Multi-Identifier Network Architecture and its Prototype on Operators Network

106 - Hui Li , Jiangxing Wu , Xin Yang 2019

IP protocol is the core of TCP/IP network layer. However, since IP address and its Domain Name are allocated and managed by a single agency, there are risks of centralization. The semantic overload of IP address also reduces its scalability and mobil ity, which further hinders the security. This paper proposes a co-governing Multi-Identifier Network (MIN) architecture that constructs a network layer with parallel coexistence of multiple identifiers, including identity, content, geographic information, and IP address. On the management plane, we develop an efficient management system using consortium blockchain with voting consensus, so the network can simultaneously manage and support by hundreds or thousands of nodes with high throughput. On the data plane, we propose an algorithm merging hash table and prefix tree (HTP) for FIB, which avoids the false-negative error and can inter-translate different identifiers with tens of billions of entries. Further, we propose a scheme to transport IP packets using CCN as a tunnel for supporting progressive deployment. We deployed the prototype of MIN to the largest operators network in Mainland China, Hongkong and Macao, and demonstrated that the network can register identifier under co-governing consensus algorithm, support VoD service very well.

بنية الشبكات والإنترنت

The Prototype of Decentralized Multilateral Co-Governing Post-IP Internet Architecture and Its Testing on Operator Networks

86 - Hui Li , Jiangxing Wu , Kaixuan Xing 2019

The Internet has become the most important infrastructure of modern society, while the existing IP network is unable to provide high-quality service. The unilateralism IP network is unable to satisfy the Co-managing and Co-governing demands to Cybers pace for most Nations in the world as well. Facing this challenge, we propose a novel Decentralized Multilateral Co-Governing Post-IP Internet architecture. To verify its effectiveness, we develop the prototype on the operators networks including China Mainland, Hong Kong, and Macao. The experiments and testing results show that this architecture is feasible for co-existing of Content-Centric Networking and IP network, and it might become a Chinese Solution to the world.

بنية الشبكات والإنترنت

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد