أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Yu Dong

Cross-correlation of Planck CMB lensing with DESI galaxy groups

104 - Zeyang Sun 2021

We measure the cross-correlation between galaxy groups constructed from DESI Legacy Imaging Survey DR8 and Planck CMB lensing, over overlapping sky area of 16876 $rm deg^2$. The detections are significant and consistent with the expected signal of th e large scale structure of the universe, over group samples of various redshift, mass and richness $N_{rm g}$ and over various scale cuts. The overall S/N is 39 for a conservative sample with $N_{rm g}geq 5$, and increases to $48$ for the sample with $N_{rm g}geq 2$. Adopting the Planck 2018 cosmology, we constrain the density bias of groups with $N_{rm g}geq 5$ as $b_{rm g}=1.31pm 0.10$, $2.22pm 0.10$, $3.52pm 0.20$ at $0.1<zleq 0.33$, $0.33<zleq 0.67$, $0.67<zleq1$ respectively. The value-added group catalog allows us to detect the dependence of bias on group mass with high significance. It also allows us to compare the measured bias with the theoretically predicted one using the estimated group mass. We find excellent agreement for the two high redshift bins. However, it is lower than the theory by $sim 3sigma$ for the lowest redshift bin. Another interesting finding is the significant impact of the thermal Sunyaev Zeldovich (tSZ). It contaminates the galaxy group-CMB lensing cross-correlation at $sim 30%$ level, and must be deprojected first in CMB lensing reconstruction.

علم الكونيات والفيزياء الفلكية Nongalactic

FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining

159 - Zhoujun Cheng , Haoyu Dong , Fan Cheng 2021

Tables store rich numerical data, but numerical reasoning over tables is still a challenge. In this paper, we find that the spreadsheet formula, which performs calculations on numerical values in tables, is naturally a strong supervision of numerical reasoning. More importantly, large amounts of spreadsheets with expert-made formulae are available on the web and can be obtained easily. FORTAP is the first method for numerical-reasoning-aware table pretraining by leveraging large corpus of spreadsheet formulae. We design two formula pretraining tasks to explicitly guide FORTAP to learn numerical reference and calculation in semi-structured tables. FORTAP achieves state-of-the-art results on two representative downstream tasks, cell type classification and formula prediction, showing great potential of numerical-reasoning-aware pretraining.

استرجاع المعلومات التعلم الآلي

New Perspective on Progressive GANs Distillationfor One-class Novelty Detection

98 - Zhiwei Zhang , Yu Dong , Hanyu Peng 2021

One-class novelty detection is conducted to iden-tify anomalous instances, with different distributions from theexpected normal instances. In this paper, the Generative Adver-sarial Network based on the Encoder-Decoder-Encoder scheme(EDE-GAN) achieve s state-of-the-art performance. The two fac-tors bellow serve the above purpose: 1) The EDE-GAN calculatesthe distance between two latent vectors as the anomaly score,which is unlike the previous methods by utilizing the reconstruc-tion error between images. 2) The model obtains best resultswhen the batch size is set to 1. To illustrate their superiority,we design a new GAN architecture, and compareperformances according to different batch sizes. Moreover, withexperimentation leads to discovery, our result implies there is alsoevidence of just how beneficial constraint on the latent space arewhen engaging in model training.In an attempt to learn compact and fast models, we present anew technology, Progressive Knowledge Distillation with GANs(P-KDGAN), which connects two standard GANs through thedesigned distillation loss. Two-step progressive learning continu-ously augments the performance of student GANs with improvedresults over single-step approach. Our experimental results onCIFAR-10, MNIST, and FMNIST datasets illustrate that P-KDGAN improves the performance of the student GAN by2.44%, 1.77%, and 1.73% when compressing the computationat ratios of 24.45:1, 311.11:1, and 700:1, respectively.

الرؤية الحاسوبية وتمييز الأنماط

Superconductivity in the vicinity of an isospin-polarized state in a cubic Dirac band

62 - Zhiyu Dong , Leonid Levitov 2021

We present a theory of superconducting pairing originating from soft critical fluctuations near isospin-polarized states in rhombohedral trilayer graphene. Using a symmetry-based approach, we determine possible isospin order types and derive the effe ctive electron-electron interactions mediated by isospin fluctuations. Superconductitivty arising due to these interactions has symmetry and order parameter structure that depend in a unique way on the mother isospin order. This model naturally leads to a superconducting phase adjacent to isospin-ordering phase transition, which mimics the behavior observed in experiment. The symmetry of the paired state predicted for the isospin order type inferred in experiments matches the observations. These findings support a scenario of superconductivity originating from electron-electron interactions.

المنصة الفائقة الفيزياء ميسكالي وننكالي

Adversarial Example Devastation and Detection on Speech Recognition System by Adding Random Noise

219 - Mingyu Dong , Diqun Yan , Yongkang Gong 2021

The automatic speech recognition (ASR) system based on deep neural network is easy to be attacked by an adversarial example due to the vulnerability of neural network, which is a hot topic in recent years. The adversarial example does harm to the ASR system, especially if the common-dependent ASR goes wrong, it will lead to serious consequences. To improve the robustness and security of the ASR system, the defense method against adversarial examples must be proposed. Based on this idea, we propose an algorithm of devastation and detection on adversarial examples which can attack the current advanced ASR system. We choose advanced text-dependent and command-dependent ASR system as our target system. Generating adversarial examples by the OPT on text-dependent ASR and the GA-based algorithm on command-dependent ASR. The main idea of our method is input transformation of the adversarial examples. Different random intensities and kinds of noise are added to the adversarial examples to devastate the perturbation previously added to the normal examples. From the experimental results, the method performs well. For the devastation of examples, the original speech similarity before and after adding noise can reach 99.68%, the similarity of the adversarial examples can reach 0%, and the detection rate of the adversarial examples can reach 94%.

أنظمة الصوت في الحاسوب التشفير والأمن الوسائط المتعددة

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

109 - Zhoujun Cheng , Haoyu Dong , Zhiruo Wang 2021

Tables are often created with hierarchies, but existing works on table reasoning mainly focus on flat tables and neglect hierarchical tables. Hierarchical tables challenge existing methods by hierarchical indexing, as well as implicit relationships o f calculation and semantics. This work presents HiTab, a free and open dataset to study question answering (QA) and natural language generation (NLG) over hierarchical tables. HiTab is a cross-domain dataset constructed from a wealth of statistical reports (analyses) and Wikipedia pages, and has unique characteristics: (1) nearly all tables are hierarchical, and (2) both target sentences for NLG and questions for QA are revised from original, meaningful, and diverse descriptive sentences authored by analysts and professions of reports. (3) to reveal complex numerical reasoning in statistical analyses, we provide fine-grained annotations of entity and quantity alignment. HiTab provides 10,686 QA pairs and descriptive sentences with well-annotated quantity and entity alignment on 3,597 tables with broad coverage of table hierarchies and numerical reasoning types. Targeting hierarchical structure, we devise a novel hierarchy-aware logical form for symbolic reasoning over tables, which shows high effectiveness. Targeting complex numerical reasoning, we propose partially supervised training given annotations of entity and quantity alignment, which helps models to largely reduce spurious predictions in the QA task. In the NLG task, we find that entity and quantity alignment also helps NLG models to generate better results in a conditional generation setting. Experiment results of state-of-the-art baselines suggest that this dataset presents a strong challenge and a valuable benchmark for future research.

الحساب واللغة استرجاع المعلومات

LSENet: Location and Seasonality Enhanced Network for Multi-Class Ocean Front Detection

262 - Cui Xie , Hao Guo , Junyu Dong 2021

Ocean fronts can cause the accumulation of nutrients and affect the propagation of underwater sound, so high-precision ocean front detection is of great significance to the marine fishery and national defense fields. However, the current ocean front detection methods either have low detection accuracy or most can only detect the occurrence of ocean front by binary classification, rarely considering the differences of the characteristics of multiple ocean fronts in different sea areas. In order to solve the above problems, we propose a semantic segmentation network called location and seasonality enhanced network (LSENet) for multi-class ocean fronts detection at pixel level. In this network, we first design a channel supervision unit structure, which integrates the seasonal characteristics of the ocean front itself and the contextual information to improve the detection accuracy. We also introduce a location attention mechanism to adaptively assign attention weights to the fronts according to their frequently occurred sea area, which can further improve the accuracy of multi-class ocean front detection. Compared with other semantic segmentation methods and current representative ocean front detection method, the experimental results demonstrate convincingly that our method is more effective.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Logic-Consistency Text Generation from Semantic Parses

506 - Chang Shu , Yusen Zhang , Xiangyu Dong 2021

Text generation from semantic parses is to generate textual descriptions for formal representation inputs such as logic forms and SQL queries. This is challenging due to two reasons: (1) the complex and intensive inner logic with the data scarcity co nstraint, (2) the lack of automatic evaluation metrics for logic consistency. To address these two challenges, this paper first proposes SNOWBALL, a framework for logic consistent text generation from semantic parses that employs an iterative training procedure by recursively augmenting the training set with quality control. Second, we propose a novel automatic metric, BLEC, for evaluating the logical consistency between the semantic parses and generated texts. The experimental results on two benchmark datasets, Logic2Text and Spider, demonstrate the SNOWBALL framework enhances the logic consistency on both BLEC and human evaluation. Furthermore, our statistical analysis reveals that BLEC is more logically consistent with human evaluation than general-purpose automatic metrics including BLEU, ROUGE and, BLEURT. Our data and code are available at https://github.com/Ciaranshu/relogic.

الحساب واللغة

Detection of cross-correlation between CMB Lensing and low-density points

90 - Fuyu Dong , Pengjie Zhang , Le Zhang 2021

Low Density Points (LDPs, citet{2019ApJ...874....7D}), obtained by removing high-density regions of observed galaxies, can trace the Large-Scale Structures (LSSs) of the universe. In particular, it offers an intriguing opportunity to detect weak grav itational lensing from low-density regions. In this work, we investigate tomographic cross-correlation between Planck CMB lensing maps and LDP-traced LSSs, where LDPs are constructed from the DR8 data release of the DESI legacy imaging survey, with about $10^6$-$10^7$ galaxies. We find that, due to the large sky coverage (20,000 deg$^2$) and large redshift depth ($zleq 1.2$), a significant detection ($10sigma$--$30sigma$) of the CMB lensing-LDP cross-correlation in all six redshift bins can be achieved, with a total significance of $sim 53sigma$ over $ ellle1024$. Moreover, the measurements are in good agreement with a theoretical template constructed from our numerical simulation in the WMAP 9-year $Lambda$CDM cosmology. A scaling factor for the lensing amplitude $A_{rm lens}$ is constrained to $A_{rm lens}=1pm0.12$ for $z<0.2$, $A_{rm lens}=1.07pm0.07$ for $0.2<z<0.4$ and $A_{rm lens}=1.07pm0.05$ for $0.4<z<0.6$, with the r-band absolute magnitude cut of $-21.5$ for LDP selection. A variety of tests have been performed to check the detection reliability, against variations in LDP samples and galaxy magnitude cuts, masks, CMB lensing maps, multipole $ell$ cuts, sky regions, and photo-z bias. We also perform a cross-correlation measurement between CMB lensing and galaxy number density, which is consistent with the CMB lensing-LDP cross-correlation. This work therefore further convincingly demonstrates that LDP is a competitive tracer of LSS.

علم الكونيات والفيزياء الفلكية Nongalactic

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

171 - Haoyu Dong , Shijie Liu , Shi Han 2021

Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges. Automatic table detection is a key enabling technique and an initial step in spreadsheet data intelligence. However, the detection task is challenged by the diversity of table structures and table layouts on the spreadsheet. Considering the analogy between a cell matrix as spreadsheet and a pixel matrix as image, and encouraged by the successful application of Convolutional Neural Networks (CNN) in computer vision, we have developed TableSense, a novel end-to-end framework for spreadsheet table detection. First, we devise an effective cell featurization scheme to better leverage the rich information in each cell; second, we develop an enhanced convolutional neural network model for table detection to meet the domain-specific requirement on precise table boundary detection; third, we propose an effective uncertainty metric to guide an active learning based smart sampling algorithm, which enables the efficient build-up of a training dataset with 22,176 tables on 10,220 sheets with broad coverage of diverse table structures and layouts. Our evaluation shows that TableSense is highly effective with 91.3% recall and 86.5% precision in EoB-2 metric, a significant improvement over both the current detection algorithm that are used in commodity spreadsheet tools and state-of-the-art convolutional neural networks in computer vision.

استرجاع المعلومات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد