No Arabic abstract
In epidemiological or demographic studies, with variable age at onset, a typical quantity of interest is the incidence of a disease (for example the cancer incidence). In these studies, the individuals are usually highly heterogeneous in terms of dates of birth (the cohort) and with respect to the calendar time (the period) and appropriate estimation methods are needed. In this article a new estimation method is presented which extends classical age-period-cohort analysis by allowing interactions between age, period and cohort effects. This paper introduces a bidimensional regularized estimate of the hazard rate where a penalty is introduced on the likelihood of the model. This penalty can be designed either to smooth the hazard rate or to enforce consecutive values of the hazard to be equal, leading to a parsimonious representation of the hazard rate. In the latter case, we make use of an iterative penalized likelihood scheme to approximate the L0 norm, which makes the computation tractable. The method is evaluated on simulated data and applied on breast cancer survival data from the SEER program.
We present a joint copula-based model for insurance claims and sizes. It uses bivariate copulae to accommodate for the dependence between these quantities. We derive the general distribution of the policy loss without the restrictive assumption of independence. We illustrate that this distribution tends to be skewed and multi-modal, and that an independence assumption can lead to substantial bias in the estimation of the policy loss. Further, we extend our framework to regression models by combining marginal generalized linear models with a copula. We show that this approach leads to a flexible class of models, and that the parameters can be estimated efficiently using maximum-likelihood. We propose a test procedure for the selection of the optimal copula family. The usefulness of our approach is illustrated in a simulation study and in an analysis of car insurance policies.
We give an overview of eight different software packages and functions available in R for semi- or non-parametric estimation of the hazard rate for right-censored survival data. Of particular interest is the accuracy of the estimation of the hazard rate in the presence of covariates, as well as the user-friendliness of the packages. In addition, we investigate the ability to incorporate covariates under both the proportional and the non-proportional hazards assumptions. We contrast the robustness, variability and precision of the functions through simulations, and then further compare differences between the functions by analyzing the cancer and TRACE survival data sets available in R, including covariates under the proportional and non-proportional hazards settings.
In this paper we describe an algorithm for predicting the websites at risk in a long range hacking activity, while jointly inferring the provenance and evolution of vulnerabilities on websites over continuous time. Specifically, we use hazard regression with a time-varying additive hazard function parameterized in a generalized linear form. The activation coefficients on each feature are continuous-time functions constrained with total variation penalty inspired by hacking campaigns. We show that the optimal solution is a 0th order spline with a finite number of adaptively chosen knots, and can be solved efficiently. Experiments on real data show that our method significantly outperforms classic methods while providing meaningful interpretability.
Kernel-based nonparametric hazard rate estimation is considered with a special class of infinite-order kernels that achieves favorable bias and mean square error properties. A fully automatic and adaptive implementation of a density and hazard rate estimator is proposed for randomly right censored data. Careful selection of the bandwidth in the proposed estimators yields estimates that are more efficient in terms of overall mean squared error performance, and in some cases achieves a nearly parametric convergence rate. Additionally, rapidly converging bandwidth estimates are presented for use in second-order kernels to supplement such kernel-based methods in hazard rate estimation. Simulations illustrate the improved accuracy of the proposed estimator against other nonparametric estimators of the density and hazard function. A real data application is also presented on survival data from 13,166 breast carcinoma patients.
Information geometry uses the formal tools of differential geometry to describe the space of probability distributions as a Riemannian manifold with an additional dual structure. The formal equivalence of compositional data with discrete probability distributions makes it possible to apply the same description to the sample space of Compositional Data Analysis (CoDA). The latter has been formally described as a Euclidean space with an orthonormal basis featuring components that are suitable combinations of the original parts. In contrast to the Euclidean metric, the information-geometric description singles out the Fisher information metric as the only one keeping the manifolds geometric structure invariant under equivalent representations of the underlying random variables. Well-known concepts that are valid in Euclidean coordinates, e.g., the Pythogorean theorem, are generalized by information geometry to corresponding notions that hold for more general coordinates. In briefly reviewing Euclidean CoDA and, in more detail, the information-geometric approach, we show how the latter justifies the use of distance measures and divergences that so far have received little attention in CoDA as they do not fit the Euclidean geometry favored by current thinking. We also show how entropy and relative entropy can describe amalgamations in a simple way, while Aitchison distance requires the use of geometric means to obtain more succinct relationships. We proceed to prove the information monotonicity property for Aitchison distance. We close with some thoughts about new directions in CoDA where the rich structure that is provided by information geometry could be exploited.