No Arabic abstract
The detection and analysis of events within massive collections of time-series has become an extremely important task for time-domain astronomy. In particular, many scientific investigations (e.g. the analysis of microlensing and other transients) begin with the detection of isolated events in irregularly-sampled series with both non-linear trends and non-Gaussian noise. We outline a semi-parametric, robust, parallel method for identifying variability and isolated events at multiple scales in the presence of the above complications. This approach harnesses the power of Bayesian modeling while maintaining much of the speed and scalability of more ad-hoc machine learning approaches. We also contrast this work with event detection methods from other fields, highlighting the unique challenges posed by astronomical surveys. Finally, we present results from the application of this method to 87.2 million EROS-2 sources, where we have obtained a greater than 100-fold reduction in candidates for certain types of phenomena while creating high-quality features for subsequent analyses.
A utility-based Bayesian population finding (BaPoFi) method was proposed by Morita and Muller (2017, Biometrics, 1355-1365) to analyze data from a randomized clinical trial with the aim of identifying good predictive baseline covariates for optimizing the target population for a future study. The approach casts the population finding process as a formal decision problem together with a flexible probability model using a random forest to define a regression mean function. BaPoFi is constructed to handle a single continuous or binary outcome variable. In this paper, we develop BaPoFi-TTE as an extension of the earlier approach for clinically important cases of time-to-event (TTE) data with censoring, and also accounting for a toxicity outcome. We model the association of TTE data with baseline covariates using a semi-parametric failure time model with a Polya tree prior for an unknown error term and a random forest for a flexible regression mean function. We define a utility function that addresses a trade-off between efficacy and toxicity as one of the important clinical considerations for population finding. We examine the operating characteristics of the proposed method in extensive simulation studies. For illustration, we apply the proposed method to data from a randomized oncology clinical trial. Concerns in a preliminary analysis of the same data based on a parametric model motivated the proposed more general approach.
Mortality is different across countries, states and regions. Several empirical research works however reveal that mortality trends exhibit a common pattern and show similar structures across populations. The key element in analyzing mortality rate is a time-varying indicator curve. Our main interest lies in validating the existence of the common trends among these curves, the similar gender differences and their variability in location among the curves at the national level. Motivated by the empirical findings, we make the study of estimating and forecasting mortality rates based on a semi-parametric approach, which is applied to multiple curves with the shape-related nonlinear variation. This approach allows us to capture the common features contained in the curve functions and meanwhile provides the possibility to characterize the nonlinear variation via a few deviation parameters. These parameters carry an instructive summary of the time-varying curve functions and can be further used to make a suggestive forecast analysis for countries with barren data sets. In this research the model is illustrated with mortality rates of Japan and China, and extended to incorporate more countries.
Disease surveillance is essential not only for the prior detection of outbreaks but also for monitoring trends of the disease in the long run. In this paper, we aim to build a tactical model for the surveillance of dengue, in particular. Most existing models for dengue prediction exploit its known relationships between climate and socio-demographic factors with the incidence counts, however they are not flexible enough to capture the steep and sudden rise and fall of the incidence counts. This has been the motivation for the methodology used in our paper. We build a non-parametric, flexible, Gaussian Process (GP) regression model that relies on past dengue incidence counts and climate covariates, and show that the GP model performs accurately, in comparison with the other existing methodologies, thus proving to be a good tactical and robust model for health authorities to plan their course of action.
In this paper, we make an experimental comparison of semi-parametric (Cox proportional hazards model, Aalens additive regression model), parametric (Weibull AFT model), and machine learning models (Random Survival Forest, Gradient Boosting with Cox Proportional Hazards Loss, DeepSurv) through the concordance index on two different datasets (PBC and GBCSG2). We present two comparisons: one with the default hyper-parameters of these models and one with the best hyper-parameters found by randomized search.
Early detection of changes in the frequency of events is an important task, in, for example, disease surveillance, monitoring of high-quality processes, reliability monitoring and public health. In this article, we focus on detecting changes in multivariate event data, by monitoring the time-between-events (TBE). Existing multivariate TBE charts are limited in the sense that, they only signal after an event occurred for each of the individual processes. This results in delays (i.e., long time to signal), especially if it is of interest to detect a change in one or a few of the processes. We propose a bivariate TBE (BTBE) chart which is able to signal in real time. We derive analytical expressions for the control limits and average time-to-signal performance, conduct a performance evaluation and compare our chart to an existing method. The findings showed that our method is a realistic approach to monitor bivariate time-between-event data, and has better detection ability than existing methods. A large benefit of our method is that it signals in real-time and that due to the analytical expressions no simulation is needed. The proposed method is implemented on a real-life dataset related to AIDS.