No Arabic abstract
We introduce the program MAVKA for determination of characteristics of extrema using observations in the adjacent data intervals, with intended applications to variable stars, but it may be used for signals of arbitrary nature. We have used a dozen of basic functions, some of them use the interval near extremum without splitting the interval (algebraic polynomial in general form, Symmetrical algebraic polynomial using only even degrees of time (phase) deviation from the position of symmetry argument), others split the interval into 2 subintervals (a Taylor series of the New Algol Variable, the function of Prof. Z. Mikulav{s}ek), or even 3 parts (Asymptotic Parabola, Wall-Supported Parabola, Wall-Supported Line, Wall-Supported Asymptotic Parabola, Parabolic Spline of defect 1). The variety of methods allows to choose the best (statistically optimal) approximation for a given data sample. As the criterion, we use the accuracy of determination of the extremum. For all parameters, the statistical errors are determined. The methods are illustrated by applications to observations of pulsating and eclipsing variable stars, as well as to the exoplanet transits. They are used for the international campaigns Inter-Longitude Astronomy, Virtual Observatory and AstroInformatics. The program may be used for studies of individual objects, also using ground-based (NSVS, ASAS, WASP, CRTS et al.) and space (GAIA, KEPLER, HIPPARCOS/TYCHO, WISE et al.) surveys.
Advanced MAVKA software for the approximation of extrema observations is used to analyze the variability of the brightness of pulsating and eclipsing stars, but may be useful in analyzing signals of any nature. A new algorithm using a parabolic (quadratic) spline is proposed. In contrast to the traditional definition of a spline as a piecewise-defined function at fixed intervals, a spline is proposed to be divided into three intervals, but the positions of the boundaries between the intervals are additional parameters. The spline defect is 1, that is, the function and its first derivative are continuous and the second derivative can be discontinuous at the boundaries. Such a function is an enhancement of the asymptotic parabola (Marsakova and Andronov 1996). The dependence of the fixed signal approximation accuracy on the location of the boundaries of the interval is considered. The parameter accuracy estimates using the least squares method and bootstrap are compared. The variability of the semi-regular pulsating star Z UMa is analyzed. The presence of multicomponent variability of an object, including, four periodic oscillations and significant variability of the amplitudes and phases of individual oscillations is shown.
The software MAVKA is described, which was elaborated for statistically optimal determination of the characteristics of the extrema of 1000+ variable stars of different types, mainly eclipsing and pulsating. The approximations are phenomenological, but not physical. As often, the discovery of a new variable star is made on time series of a single-filter (single-channel) data, and there is no possibility to determine parameters needed for physical modelling (e.g. temperature, radial velocities, mass ratio of binaries). Besides classical polynomial approximation AP (we limited the degree of the polynomial from 2 to 9), there are realized symmetrical approximations (symmetrical polynomials SP, wall-supported horizontal line WSL and parabola WSP, restricted polynomials of non-integer order based on approximations of the functions proposed by Andronov (2012) and Mikulasek (2015) and generally asymmetric functions (asymptotic parabola AP, parabolic spline PS, generalized hyperbolic secant function SECH and log-normal-like BSK). This software is a successor of the Observation Obscurer with some features for the variable star research, including a block for running parabola RP scalegram and approximation. Whereas the RP is oriented on approximation of the complete data set. MAVKA is pointed to parts of the light curve close to extrema (including total eclipses and transits of stars and exoplanets). The functions for wider intervals, covering the eclipse totally, were discussed in 2017Ap.....60...57A . Global and local approximations are reviewed in 2020kdbd.book..191A . The software is available at http://uavso.org.ua/mavka and https://katerynaandrych.wixsite.com/mavka. We have analyzed the data from own observations, as well as from monitoring obtained at ground-based and space (currently, mainly, TESS) observatories. It may be used for signals of any nature.
A general purpose fitting model for one-dimensional astrometric signals is developed, building on a maximum likelihood framework, and its performance is evaluated by simulation over a set of realistic image instances. The fit quality is analysed as a function of the number of terms used for signal expansion, and of astrometric error, rather than RMS discrepancy with respect to the input signal. The tuning of the function basis to the statistical characteristics of the signal ensemble is discussed. The fit sensitivity to a priori knowledge of the source spectra is addressed. Some implications of the current results on calibration and data reduction aspects are discussed, in particular with respect to Gaia.
Hypothesis Selection is a fundamental distribution learning problem where given a comparator-class $Q={q_1,ldots, q_n}$ of distributions, and a sampling access to an unknown target distribution $p$, the goal is to output a distribution $q$ such that $mathsf{TV}(p,q)$ is close to $opt$, where $opt = min_i{mathsf{TV}(p,q_i)}$ and $mathsf{TV}(cdot, cdot)$ denotes the total-variation distance. Despite the fact that this problem has been studied since the 19th century, its complexity in terms of basic resources, such as number of samples and approximation guarantees, remains unsettled (this is discussed, e.g., in the charming book by Devroye and Lugosi `00). This is in stark contrast with other (younger) learning settings, such as PAC learning, for which these complexities are well understood. We derive an optimal $2$-approximation learning strategy for the Hypothesis Selection problem, outputting $q$ such that $mathsf{TV}(p,q) leq2 cdot opt + eps$, with a (nearly) optimal sample complexity of~$tilde O(log n/epsilon^2)$. This is the first algorithm that simultaneously achieves the best approximation factor and sample complexity: previously, Bousquet, Kane, and Moran (COLT `19) gave a learner achieving the optimal $2$-approximation, but with an exponentially worse sample complexity of $tilde O(sqrt{n}/epsilon^{2.5})$, and Yatracos~(Annals of Statistics `85) gave a learner with optimal sample complexity of $O(log n /epsilon^2)$ but with a sub-optimal approximation factor of $3$.
Cancer and healthy cells have distinct distributions of molecular properties and thus respond differently to drugs. Cancer drugs ideally kill cancer cells while limiting harm to healthy cells. However, the inherent variance among cells in both cancer and healthy cell populations increases the difficulty of selective drug action. Here we propose a classification framework based on the idea that an ideal cancer drug should maximally discriminate between cancer and healthy cells. We first explore how molecular markers can be used to discriminate cancer cells from healthy cells on a single cell basis, and then how the effects of drugs are statistically predicted by these molecular markers. We then combine these two ideas to show how to optimally match drugs to tumor cells. We find that expression levels of a handful of genes suffice to discriminate well between individual cells in cancer and healthy tissue. We also find that gene expression predicts the efficacy of cancer drugs, suggesting that the cancer drugs act as classifiers using gene profiles. In agreement with our first finding, a small number of genes predict drug efficacy well. Finally, we formulate a framework that defines an optimal drug, and predicts drug cocktails that may target cancer more accurately than the individual drugs alone. Conceptualizing cancer drugs as solving a discrimination problem in the high-dimensional space of molecular markers promises to inform the design of new cancer drugs and drug cocktails.