No Arabic abstract
Aiming to generate realistic synthetic times series of the bivariate process of daily mean temperature and precipitations, we introduce a non-homogeneous hidden Markov model. The non-homogeneity lies in periodic transition probabilities between the hidden states, and time-dependent emission distributions. This enables the model to account for the non-stationary behaviour of weather variables. By carefully choosing the emission distributions, it is also possible to model the dependance structure between the two variables. The model is applied to several weather stations in Europe with various climates, and we show that it is able to simulate realistic bivariate time series.
Atmospheric trace-gas inversion is the procedure by which the sources and sinks of a trace gas are identified from observations of its mole fraction at isolated locations in space and time. This is inherently a spatio-temporal bivariate inversion problem, since the mole-fraction field evolves in space and time and the flux is also spatio-temporally distributed. Further, the bivariate model is likely to be non-Gaussian since the flux field is rarely Gaussian. Here, we use conditioning to construct a non-Gaussian bivariate model, and we describe some of its properties through auto- and cross-cumulant functions. A bivariate non-Gaussian, specifically trans-Gaussian, model is then achieved through the use of Box--Cox transformations, and we facilitate Bayesian inference by approximating the likelihood in a hierarchical framework. Trace-gas inversion, especially at high spatial resolution, is frequently highly sensitive to prior specification. Therefore, unlike conventional approaches, we assimilate trace-gas inventory information with the observational data at the parameter layer, thus shifting prior sensitivity from the inventory itself to its spatial characteristics (e.g., its spatial length scale). We demonstrate the approach in controlled-experiment studies of methane inversion, using fluxes extracted from inventories of the UK and Ireland and of Northern Australia.
Intrusion detection is only a starting step in securing IT infrastructure. Prediction of intrusions is the next step to provide an active defense against incoming attacks. Current intrusion prediction methods focus mainly on prediction of either intrusion type or intrusion category and do not use or provide contextual information such as source and target IP address. In addition most of them are dependant on domain knowledge and specific scenario knowledge. The proposed algorithm employs a bag-of-words model together with a hidden Markov model which not depend on specific domain knowledge. Since this algorithm depends on a training process it is adaptable to different conditions. A key advantage of the proposed algorithm is the inclusion of contextual data such as source IP address, destination IP range, alert type and alert category in its prediction, which is crucial for an eventual response. Experiments conducted using a public data set generated over 2500 alert predictions and achieved accuracy of 81% and 77% for single step and five step predictions respectively for prediction of the next alert cluster. It also achieved an accuracy of prediction of 95% and 92% for single step and five step predictions respectively for prediction of the next alert category. The proposed methods achieved a prediction accuracy improvement of 5% for alert category over existing variable length Markov chain intrusion prediction methods, while providing more information for a possible defense.
The Markov-modulated Poisson process is utilised for count modelling in a variety of areas such as queueing, reliability, network and insurance claims analysis. In this paper, we extend the Markov-modulated Poisson process framework through the introduction of a flexible frequency perturbation measure. This contribution enables known information of observed event arrivals to be naturally incorporated in a tractable manner, while the hidden Markov chain captures the effect of unobservable drivers of the data. In addition to increases in accuracy and interpretability, this method supplements analysis of the latent factors. Further, this procedure naturally incorporates data features such as over-dispersion and autocorrelation. Additional insights can be generated to assist analysis, including a procedure for iterative model improvement. Implementation difficulties are also addressed with a focus on dealing with large data sets, where latent models are especially advantageous due the large number of observations facilitating identification of hidden factors. Namely, computational issues such as numerical underflow and high processing cost arise in this context and in this paper, we produce procedures to overcome these problems. This modelling framework is demonstrated using a large insurance data set to illustrate theoretical, practical and computational contributions and an empirical comparison to other count models highlight the advantages of the proposed approach.
The identification of precipitation regimes is important for many purposes such as agricultural planning, water resource management, and return period estimation. Since precipitation and other related meteorological data typically exhibit spatial dependency and different characteristics at different time scales, clustering such data presents unique challenges. In this paper, we develop a flexible model-based approach to cluster multi-scale spatial functional data to address such problems. The underlying clustering model is a functional linear model , and the cluster memberships are assumed to be a realization from a Markov random field with geographic covariates. The methodology is applied to a precipitation data from China to identify precipitation regimes.
We address the problem of modeling constrained hospital resources in the midst of the COVID-19 pandemic in order to inform decision-makers of future demand and assess the societal value of possible interventions. For broad applicability, we focus on the common yet challenging scenario where patient-level data for a region of interest are not available. Instead, given daily admissions counts, we model aggregated counts of observed resource use, such as the number of patients in the general ward, in the intensive care unit, or on a ventilator. In order to explain how individual patient trajectories produce these counts, we propose an aggregate count explicit-duration hidden Markov model, nicknamed the ACED-HMM, with an interpretable, compact parameterization. We develop an Approximate Bayesian Computation approach that draws samples from the posterior distribution over the models transition and duration parameters given aggregate counts from a specific location, thus adapting the model to a region or individual hospital site of interest. Samples from this posterior can then be used to produce future forecasts of any counts of interest. Using data from the United States and the United Kingdom, we show our mechanistic approach provides competitive probabilistic forecasts for the future even as the dynamics of the pandemic shift. Furthermore, we show how our model provides insight about recovery probabilities or length of stay distributions, and we suggest its potential to answer challenging what-if questions about the societal value of possible interventions.