No Arabic abstract
In this paper, we make an experimental comparison of semi-parametric (Cox proportional hazards model, Aalens additive regression model), parametric (Weibull AFT model), and machine learning models (Random Survival Forest, Gradient Boosting with Cox Proportional Hazards Loss, DeepSurv) through the concordance index on two different datasets (PBC and GBCSG2). We present two comparisons: one with the default hyper-parameters of these models and one with the best hyper-parameters found by randomized search.
Semi-parametric survival analysis methods like the Cox Proportional Hazards (CPH) regression (Cox, 1972) are a popular approach for survival analysis. These methods involve fitting of the log-proportional hazard as a function of the covariates and are convenient as they do not require estimation of the baseline hazard rate. Recent approaches have involved learning non-linear representations of the input covariates and demonstrate improved performance. In this paper we argue against such deep parameterizations for survival analysis and experimentally demonstrate that more interpretable semi-parametric models inspired from mixtures of experts perform equally well or in some cases better than such overly parameterized deep models.
The detection and analysis of events within massive collections of time-series has become an extremely important task for time-domain astronomy. In particular, many scientific investigations (e.g. the analysis of microlensing and other transients) begin with the detection of isolated events in irregularly-sampled series with both non-linear trends and non-Gaussian noise. We outline a semi-parametric, robust, parallel method for identifying variability and isolated events at multiple scales in the presence of the above complications. This approach harnesses the power of Bayesian modeling while maintaining much of the speed and scalability of more ad-hoc machine learning approaches. We also contrast this work with event detection methods from other fields, highlighting the unique challenges posed by astronomical surveys. Finally, we present results from the application of this method to 87.2 million EROS-2 sources, where we have obtained a greater than 100-fold reduction in candidates for certain types of phenomena while creating high-quality features for subsequent analyses.
We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning. Our method takes a mixture-of-experts approach to combine parametric and non-parametric models of the environment such that the final value estimate has the least expected error. We do so by first estimating the local accuracy of each model and then using a planner to select which model to use at every time step as to minimize the return error estimate along entire trajectories. Across a variety of domains, our mixture-based approach outperforms the individual models alone as well as state-of-the-art importance sampling-based estimators.
We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and behaviour. We discuss when to expect benefits from either approach, and interpret prior work in this context. We hypothesise that, under suitable conditions, replay-based algorithms should be competitive to or better than model-based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free. We validated this hypothesis on Atari 2600 video games. The replay-based algorithm attained state-of-the-art data efficiency, improving over prior results with parametric models.
Semi-parametric regression models are used in several applications which require comprehensibility without sacrificing accuracy. Typical examples are spline interpolation in geophysics, or non-linear time series problems, where the system includes a linear and non-linear component. We discuss here the use of a finite Determinantal Point Process (DPP) for approximating semi-parametric models. Recently, Barthelme, Tremblay, Usevich, and Amblard introduced a novel representation of some finite DPPs. These authors formulated extended L-ensembles that can conveniently represent partial-projection DPPs and suggest their use for optimal interpolation. With the help of this formalism, we derive a key identity illustrating the implicit regularization effect of determinantal sampling for semi-parametric regression and interpolation. Also, a novel projected Nystrom approximation is defined and used to derive a bound on the expected risk for the corresponding approximation of semi-parametric regression. This work naturally extends similar results obtained for kernel ridge regression.