Finding Quasars behind the Galactic Plane. I. Candidate Selections with Transfer Learning


Abstract in English

Quasars behind the Galactic plane (GPQs) are important astrometric references and useful probes of Milky Way gas. However, the search for GPQs is difficult due to large extinctions and high source densities in the Galactic plane. Existing selection methods for quasars developed using high Galactic latitude (high-$b$) data cannot be applied to the Galactic plane directly because the photometric data obtained from high-$b$ regions and the Galactic plane follow different probability distributions. To alleviate this dataset shift problem for quasar candidate selection, we adopt a Transfer Learning Framework at both data and algorithm levels. At the data level, to make a training set in which dataset shift is modeled, we synthesize quasars and galaxies behind the Galactic plane based on SDSS sources and Galactic dust map. At the algorithm level, to reduce the effect of class imbalance, we transform the three-class classification problem for stars, galaxies, and quasars to two binary classification tasks. We apply XGBoost algorithm on Pan-STARRS1 (PS1) and AllWISE photometry for classification, and additional cut on Gaia proper motion to remove stellar contaminants. We obtain a reliable GPQ candidate catalog with 160,946 sources located at $|b|leq 20^{circ}$ in PS1-AllWISE footprint. Photometric redshifts of GPQ candidates achieved with XGBoost regression algorithm show that our selection method can identify quasars in a wide redshift range ($0<zlesssim5$). This study extends the systematic searches for quasars to the dense stellar fields and shows the feasibility of using astronomical knowledge to improve data mining under complex conditions in the Big Data era.

Download