أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Qingyang Li

What to expect from dynamical modelling of cluster haloes I. The information content of different dynamical tracers

89 - Qingyang Li , Jiaxin Han , Wenting Wang 2021

Using hydrodynamical simulations, we study how well the underlying gravitational potential of a galaxy cluster can be modelled dynamically with different types of tracers. In order to segregate different systematics and the effects of varying estimat or performances, we first focus on applying a generic minimal assumption method (oPDF) to model the simulated haloes using the full 6-D phasespace information. We show that the halo mass and concentration can be recovered in an ensemble unbiased way, with a stochastic bias that varies from halo to halo, mostly reflecting deviations from steady state in the tracer distribution. The typical systematic uncertainty is $sim 0.17$ dex in the virial mass and $sim 0.17$ dex in the concentration as well when dark matter particles are used as tracers. The dynamical state of satellite galaxies are close to that of dark matter particles, while intracluster stars are less in a steady state, resulting in a $sim$ 0.26 dex systematic uncertainty in mass. Compared with galactic haloes hosting Milky-Way-like galaxies, cluster haloes show a larger stochastic bias in the recovered mass profiles. We also test the accuracy of using intracluster gas as a dynamical tracer modelled through a generalised hydrostatic equilibrium equation, and find a comparable systematic uncertainty in the estimated mass to that using dark matter. Lastly, we demonstrate that our conclusions are largely applicable to other steady-state dynamical models including the spherical Jeans equation, by quantitatively segregating their statistical efficiencies and robustness to systematics. We also estimate the limiting number of tracers that leads to the systematics-dominated regime in each case.

الفيزياء الفلكية من المجرات علم الكونيات والفيزياء الفلكية Nongalactic

RLTIR: Activity-based Interactive Person Identification based on Reinforcement Learning Tree

89 - Qingyang Li , Zhiwen Yu , Lina Yao 2021

Identity recognition plays an important role in ensuring security in our daily life. Biometric-based (especially activity-based) approaches are favored due to their fidelity, universality, and resilience. However, most existing machine learning-based approaches rely on a traditional workflow where models are usually trained once for all, with limited involvement from end-users in the process and neglecting the dynamic nature of the learning process. This makes the models static and can not be updated in time, which usually leads to high false positive or false negative. Thus, in practice, an expert is desired to assist with providing high-quality observations and interpretation of model outputs. It is expedient to combine both advantages of human experts and the computational capability of computers to create a tight-coupling incremental learning process for better performance. In this study, we develop RLTIR, an interactive identity recognition approach based on reinforcement learning, to adjust the identification model by human guidance. We first build a base tree-structured identity recognition model. And an expert is introduced in the model for giving feedback upon model outputs. Then, the model is updated according to strategies that are automatically learned under a designated reinforcement learning framework. To the best of our knowledge, it is the very first attempt to combine human expert knowledge with model learning in the area of identity recognition. The experimental results show that the reinforced interactive identity recognition framework outperforms baseline methods with regard to recognition accuracy and robustness.

تفاعل الإنسان والحاسوب

The Three Hundred Project: the stellar and gas profiles

66 - Qingyang Li , Weiguang Cui , Xiaohu Yang 2020

Using the catalogues of galaxy clusters from The Three Hundred project, modelled with both hydrodynamic simulations, (Gadget-X and Gadget-MUSIC), and semi-analytic models (SAMs), we study the scatter and self-similarity of the profiles and distributi ons of the baryonic components of the clusters: the stellar and gas mass, metallicity, the stellar age, gas temperature, and the (specific) star formation rate. Through comparisons with observational results, we find that the shape and the scatter of the gas density profiles matches well the observed trends including the reduced scatter at large radii which is a signature of self-similarity suggested in previous studies. One of our simulated sets, Gadget-X, reproduces well the shape of the observed temperature profile, while Gadget-MUSIC has a higher and flatter profile in the cluster centre and a lower and steeper profile at large radii. The gas metallicity profiles from both simulation sets, despite following the observed trend, have a relatively lower normalisation. The cumulative stellar density profiles from SAMs are in better agreement with the observed result than both hydrodynamic simulations which show relatively higher profiles. The scatter in these physical profiles, especially in the cluster centre region, shows a dependence on the cluster dynamical state and on the cool-core/non-cool-core dichotomy. The stellar age, metallicity and (s)SFR show very large scatter, which are then presented in 2D maps. We also do not find any clear radial dependence of these properties. However, the brightest central galaxies have distinguishable features compared to the properties of the satellite galaxies.

الفيزياء الفلكية من المجرات

Large-scale Feature Selection of Risk Genetic Factors for Alzheimers Disease via Distributed Group Lasso Regression

160 - Qingyang Li , Dajiang Zhu , Jie Zhang 2017

Genome-wide association studies (GWAS) have achieved great success in the genetic study of Alzheimers disease (AD). Collaborative imaging genetics studies across different research institutions show the effectiveness of detecting genetic risk factors . However, the high dimensionality of GWAS data poses significant challenges in detecting risk SNPs for AD. Selecting relevant features is crucial in predicting the response variable. In this study, we propose a novel Distributed Feature Selection Framework (DFSF) to conduct the large-scale imaging genetics studies across multiple institutions. To speed up the learning process, we propose a family of distributed group Lasso screening rules to identify irrelevant features and remove them from the optimization. Then we select the relevant group features by performing the group Lasso feature selection process in a sequence of parameters. Finally, we employ the stability selection to rank the top risk SNPs that might help detect the early stage of AD. To the best of our knowledge, this is the first distributed feature selection model integrated with group Lasso feature selection as well as detecting the risk genetic factors across multiple research institutions system. Empirical studies are conducted on 809 subjects with 5.9 million SNPs which are distributed across several individual institutions, demonstrating the efficiency and effectiveness of the proposed method.

التعلم الآلي التعلم الالي

Large-scale Collaborative Imaging Genetics Studies of Risk Genetic Factors for Alzheimers Disease Across Multiple Institutions

86 - Qingyang Li , Tao Yang , Liang Zhan 2016

Genome-wide association studies (GWAS) offer new opportunities to identify genetic risk factors for Alzheimers disease (AD). Recently, collaborative efforts across different institutions emerged that enhance the power of many existing techniques on i ndividual institution data. However, a major barrier to collaborative studies of GWAS is that many institutions need to preserve individual data privacy. To address this challenge, we propose a novel distributed framework, termed Local Query Model (LQM) to detect risk SNPs for AD across multiple research institutions. To accelerate the learning process, we propose a Distributed Enhanced Dual Polytope Projection (D-EDPP) screening rule to identify irrelevant features and remove them from the optimization. To the best of our knowledge, this is the first successful run of the computationally intensive model selection procedure to learn a consistent model across different institutions without compromising their privacy while ranking the SNPs that may collectively affect AD. Empirical studies are conducted on 809 subjects with 5.9 million SNP features which are distributed across three individual institutions. D-EDPP achieved a 66-fold speed-up by effectively identifying irrelevant features.

التعلم الآلي التعلم الالي

Stochastic Coordinate Coding and Its Application for Drosophila Gene Expression Pattern Annotation

130 - Binbin Lin , Qingyang Li , Qian Sun 2014

textit{Drosophila melanogaster} has been established as a model organism for investigating the fundamental principles of developmental gene interactions. The gene expression patterns of textit{Drosophila melanogaster} can be documented as digital ima ges, which are annotated with anatomical ontology terms to facilitate pattern discovery and comparison. The automated annotation of gene expression pattern images has received increasing attention due to the recent expansion of the image database. The effectiveness of gene expression pattern annotation relies on the quality of feature representation. Previous studies have demonstrated that sparse coding is effective for extracting features from gene expression images. However, solving sparse coding remains a computationally challenging problem, especially when dealing with large-scale data sets and learning large size dictionaries. In this paper, we propose a novel algorithm to solve the sparse coding problem, called Stochastic Coordinate Coding (SCC). The proposed algorithm alternatively updates the sparse codes via just a few steps of coordinate descent and updates the dictionary via second order stochastic gradient descent. The computational cost is further reduced by focusing on the non-zero components of the sparse codes and the corresponding columns of the dictionary only in the updating procedure. Thus, the proposed algorithm significantly improves the efficiency and the scalability, making sparse coding applicable for large-scale data sets and large dictionary sizes. Our experiments on Drosophila gene expression data sets demonstrate the efficiency and the effectiveness of the proposed algorithm.

التعلم الآلي الهندسة الحاسوبية، المالية،العلوم

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد