Inference of an explanatory variable from observations in a high-dimensional space: application to high-resolution spectra of stars


Abstract in English

Our aim is to evaluate fundamental parameters from the analysis of the electromagnetic spectra of stars. We may use $10^3$-$10^5$ spectra; each spectrum being a vector with $10^2$-$10^4$ coordinates. We thus face the so-called curse of dimensionality. We look for a method to reduce the size of this data-space, keeping only the most relevant information.As a reference method, we use principal component analysis (PCA) to reduce dimensionality. However, PCA is an unsupervised method, therefore its subspace was not consistent with the parameter. We thus tested a supervised method based on Sliced Inverse Regression (SIR), which provides a subspace consistent with the parameter. It also shares analogies with factorial discriminant analysis: the method slices the database along the parameter variation, and builds the subspace which maximizes the inter-slice variance, while standardizing the total projected variance of the data. Nevertheless the performances of SIR were not satisfying in standard usage, because of the non-monotonicity of the unknown function linking the data to the parameter and because of the noise propagation. We show that better performances can be achieved by selecting the most relevant directions for parameter inference. Preliminary tests are performed on synthetic pseudo-line profiles plus noise. Using one direction, we show that compared to PCA, the error associated with SIR is 50$%$ smaller on a non-linear parameter, and 70$%$ smaler on a linear parameter. Moreover, using a selected direction, the error is 80$%$ smaller for a non-linear parameter, and 95$%$ smaller for a linear parameter.

Download