No Arabic abstract
In this paper we describe an algorithm for predicting the websites at risk in a long range hacking activity, while jointly inferring the provenance and evolution of vulnerabilities on websites over continuous time. Specifically, we use hazard regression with a time-varying additive hazard function parameterized in a generalized linear form. The activation coefficients on each feature are continuous-time functions constrained with total variation penalty inspired by hacking campaigns. We show that the optimal solution is a 0th order spline with a finite number of adaptively chosen knots, and can be solved efficiently. Experiments on real data show that our method significantly outperforms classic methods while providing meaningful interpretability.
We give an overview of eight different software packages and functions available in R for semi- or non-parametric estimation of the hazard rate for right-censored survival data. Of particular interest is the accuracy of the estimation of the hazard rate in the presence of covariates, as well as the user-friendliness of the packages. In addition, we investigate the ability to incorporate covariates under both the proportional and the non-proportional hazards assumptions. We contrast the robustness, variability and precision of the functions through simulations, and then further compare differences between the functions by analyzing the cancer and TRACE survival data sets available in R, including covariates under the proportional and non-proportional hazards settings.
In epidemiological or demographic studies, with variable age at onset, a typical quantity of interest is the incidence of a disease (for example the cancer incidence). In these studies, the individuals are usually highly heterogeneous in terms of dates of birth (the cohort) and with respect to the calendar time (the period) and appropriate estimation methods are needed. In this article a new estimation method is presented which extends classical age-period-cohort analysis by allowing interactions between age, period and cohort effects. This paper introduces a bidimensional regularized estimate of the hazard rate where a penalty is introduced on the likelihood of the model. This penalty can be designed either to smooth the hazard rate or to enforce consecutive values of the hazard to be equal, leading to a parsimonious representation of the hazard rate. In the latter case, we make use of an iterative penalized likelihood scheme to approximate the L0 norm, which makes the computation tractable. The method is evaluated on simulated data and applied on breast cancer survival data from the SEER program.
The concepts of Gross Domestic Product (GDP), GDP per capita, and population are central to the study of political science and economics. However, a growing literature suggests that existing measures of these concepts contain considerable error or are based on overly simplistic modeling choices. We address these problems by creating a dynamic, three-dimensional latent trait model, which uses observed information about GDP, GDP per capita, and population to estimate posterior prediction intervals for each of these important concepts. By combining historical and contemporary sources of information, we are able to extend the temporal and spatial coverage of existing datasets for country-year units back to 1500 A.D through 2015 A.D. and, because the model makes use of multiple indicators of the underlying concepts, we are able to estimate the relative precision of the different country-year estimates. Overall, our latent variable model offers a principled method for incorporating information from different historic and contemporary data sources. It can be expanded or refined as researchers discover new or alternative sources of information about these concepts.
In recent years there has been a resurgence of interest in generation adequacy risk assessment, due to the need to include variable generation renewables within such calculations. This paper will describe new statistical approaches to estimating the joint distribution of demand and available VG capacity; this is required for the LOLE calculations used in many statutory adequacy studies, for example those of GB and PJM. The most popular estimation technique in the VG-integration literature is `hindcast, in which the historic joint distribution of demand and available VG is used as a predictive distribution. Through the use of bootstrap statistical analysis, this paper will show that due to extreme sparsity of data on times of high demand and low VG, hindcast results can suffer from sampling uncertainty to the extent that they have little practical meaning. An alternative estimation approach, in which a marginal distribution of available VG is rescaled according to demand level, is thus proposed. This reduces sampling uncertainty at the expense of the additional model structure assumption, and further provides a means of assessing the sensitivity of model outputs to the VG-demand relationship by varying the function of demand by which the marginal VG distribution is rescaled.
The vast majority of landslide susceptibility studies assumes the slope instability process to be time-invariant under the definition that the past and present are keys to the future. This assumption may generally be valid. However, the trigger, be it a rainfall or an earthquake event, clearly varies over time. And yet, the temporal component of the trigger is rarely included in landslide susceptibility studies and only confined to hazard assessment. In this work, we investigate a population of landslides triggered in response to the 2017 Jiuzhaigou earthquake ($M_w = 6.5$) including the associated ground motion in the analyses, these being carried out at the Slope Unit (SU) level. We do this by implementing a Bayesian version of a Generalized Additive Model and assuming that the slope instability across the SUs in the study area behaves according to a Bernoulli probability distribution. This procedure would generally produce a susceptibility map reflecting the spatial pattern of the specific trigger and therefore of limited use for land use planning. However, we implement this first analytical step to reliably estimate the ground motion effect, and its distribution, on unstable SUs. We then assume the effect of the ground motion to be time-invariant, enabling statistical simulations for any ground motion scenario that occurred in the area from 1933 to 2017. As a result, we obtain the full spectrum of potential susceptibility patterns over the last century and compress this information into a susceptibility model/map representative of all the possible ground motion patterns since 1933. This backward statistical simulations can also be further exploited in the opposite direction where, by accounting for scenario-based ground motion, one can also use it in a forward direction to estimate future unstable slopes.