No Arabic abstract
In this work, we study the event occurrences of user activities on online social network platforms. To characterize the social activity interactions among network users, we propose a network group Hawkes (NGH) process model. Particularly, the observed network structure information is employed to model the users dynamic posting behaviors. Furthermore, the users are clustered into latent groups according to their dynamic behavior patterns. To estimate the model, a constraint maximum likelihood approach is proposed. Theoretically, we establish the consistency and asymptotic normality of the estimators. In addition, we show that the group memberships can be identified consistently. To conduct estimation, a branching representation structure is firstly introduced, and a stochastic EM (StEM) algorithm is developed to tackle the computational problem. Lastly, we apply the proposed method to a social network data collected from Sina Weibo, and identify the infuential network users as an interesting application.
We modify ETAS models by replacing the Pareto-like kernel proposed by Ogata with a Mittag-Leffler type kernel. Provided that the kernel decays as a power law with exponent $beta + 1 in (1,2]$, this replacement has the advantage that the Laplace transform of the Mittag-Leffler function is known explicitly, leading to simpler calculation of relevant quantities.
We test three common information criteria (IC) for selecting the order of a Hawkes process with an intensity kernel that can be expressed as a mixture of exponential terms. These processes find application in high-frequency financial data modelling. The information criteria are Akaikes information criterion (AIC), the Bayesian information criterion (BIC) and the Hannan-Quinn criterion (HQ). Since we work with simulated data, we are able to measure the performance of model selection by the success rate of the IC in selecting the model that was used to generate the data. In particular, we are interested in the relation between correct model selection and underlying sample size. The analysis includes realistic sample sizes and parameter sets from recent literature where parameters were estimated using empirical financial intra-day data. We compare our results to theoretical predictions and similar empirical findings on the asymptotic distribution of model selection for consistent and inconsistent IC.
The highly influential two-group model in testing a large number of statistical hypotheses assumes that the test statistics are drawn independently from a mixture of a high probability null distribution and a low probability alternative. Optimal control of the marginal false discovery rate (mFDR), in the sense that it provides maximal power (expected true discoveries) subject to mFDR control, is known to be achieved by thresholding the local false discovery rate (locFDR), i.e., the probability of the hypothesis being null given the set of test statistics, with a fixed threshold. We address the challenge of controlling optimally the popular false discovery rate (FDR) or positive FDR (pFDR) rather than mFDR in the general two-group model, which also allows for dependence between the test statistics. These criteria are less conservative than the mFDR criterion, so they make more rejections in expectation. We derive their optimal multiple testing (OMT) policies, which turn out to be thresholding the locFDR with a threshold that is a function of the entire set of statistics. We develop an efficient algorithm for finding these policies, and use it for problems with thousands of hypotheses. We illustrate these procedures on gene expression studies.
Change-points are a routine feature of big data observed in the form of high-dimensional data streams. In many such data streams, the component series possess group structures and it is natural to assume that changes only occur in a small number of all groups. We propose a new change point procedure, called groupInspect, that exploits the group sparsity structure to estimate a projection direction so as to aggregate information across the component series to successfully estimate the change-point in the mean structure of the series. We prove that the estimated projection direction is minimax optimal, up to logarithmic factors, when all group sizes are of comparable order. Moreover, our theory provide strong guarantees on the rate of convergence of the change-point location estimator. Numerical studies demonstrates the competitive performance of groupInspect in a wide range of settings and a real data example confirms the practical usefulness of our procedure.
This paper deals with the optimization of industrial asset management strategies, whose profitability is characterized by the Net Present Value (NPV) indicator which is assessed by a Monte Carlo simulator. The developed method consists in building a metamodel of this stochastic simulator, allowing to get, for a given model input, the NPV probability distribution without running the simulator. The present work is concentrated on the emulation of the quantile function of the stochastic simulator by interpolating well chosen basis functions and metamodeling their coefficients (using the Gaussian process metamodel). This quantile function metamodel is then used to treat a problem of strategy maintenance optimization (four systems installed on different plants), in order to optimize an NPV quantile. Using the Gaussian process framework, an adaptive design method (called QFEI) is defined by extending in our case the well known EGO algorithm. This allows to obtain an optimal solution using a small number of simulator runs.