ترغب بنشر مسار تعليمي؟ اضغط هنا

We consider a measurement constrained supervised learning problem, that is, (1) full sample of the predictors are given; (2) the response observations are unavailable and expensive to measure. Thus, it is ideal to select a subsample of predictor obse rvations, measure the corresponding responses, and then fit the supervised learning model on the subsample of the predictors and responses. However, model fitting is a trial and error process, and a postulated model for the data could be misspecified. Our empirical studies demonstrate that most of the existing subsampling methods have unsatisfactory performances when the models are misspecified. In this paper, we develop a novel subsampling method, called LowCon, which outperforms the competing methods when the working linear model is misspecified. Our method uses orthogonal Latin hypercube designs to achieve a robust estimation. We show that the proposed design-based estimator approximately minimizes the so-called worst-case bias with respect to many possible misspecification terms. Both the simulated and real-data analyses demonstrate the proposed estimator is more robust than several subsample least squares estimators obtained by state-of-the-art subsampling methods.
The number density and correlation function of galaxies are two key quantities to characterize the distribution of the observed galaxy population. High-$z$ spectroscopic surveys, which usually involve complex target selection and are incomplete in re dshift sampling, present both opportunities and challenges to measure these quantities reliably in the high-$z$ Universe. Using realistic mock catalogs we show that target selection and redshift incompleteness can lead to significantly biased results. We develop methods to correct such bias, using information provided by the parent photometric data from which the spectroscopic sample is constructed. Our tests using realistic mock samples show that our methods are able to reproduce the true stellar mass function and correlation function reliably. As applications, mock catalogs are constructed for two high-z surveys: the existing zCOSMOS-bright galaxy sample and the forthcoming PFS galaxy evolution survey. We apply our methods to the zCOSMOS-bright sample and make comparisons with results obtained before. The same set of mock samples are used to quantify cosmic variances expected for different sample sizes. We find that, for both number density and correlation function, the relative error due to cosmic variance in the PFS galaxy survey will be reduced by a factor of 3-4 when compared to zCOSMOS.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا