No Arabic abstract
The $p_0$ model is an exponential random graph model for directed networks with the bi-degree sequence as the exclusively sufficient statistic. It captures the network feature of degree heterogeneity. The consistency and asymptotic normality of a differentially private estimator of the parameter in the private $p_0$ model has been established. However, the $p_0$ model only focuses on binary edges. In many realistic networks, edges could be weighted, taking a set of finite discrete values. In this paper, we further show that the moment estimators of the parameters based on the differentially private bi-degree sequence in the weighted $p_0$ model are consistent and asymptotically normal. Numerical studies demonstrate our theoretical findings.
The edges in networks are not only binary, either present or absent, but also take weighted values in many scenarios (e.g., the number of emails between two users). The covariate-$p_0$ model has been proposed to model binary directed networks with the degree heterogeneity and covariates. However, it may cause information loss when it is applied in weighted networks. In this paper, we propose to use the Poisson distribution to model weighted directed networks, which admits the sparsity of networks, the degree heterogeneity and the homophily caused by covariates of nodes. We call it the emph{network Poisson model}. The model contains a density parameter $mu$, a $2n$-dimensional node parameter ${theta}$ and a fixed dimensional regression coefficient ${gamma}$ of covariates. Since the number of parameters increases with $n$, asymptotic theory is nonstandard. When the number $n$ of nodes goes to infinity, we establish the $ell_infty$-errors for the maximum likelihood estimators (MLEs), $hat{theta}$ and $hat{{gamma}}$, which are $O_p( (log n/n)^{1/2} )$ for $hat{theta}$ and $O_p( log n/n)$ for $hat{{gamma}}$, up to an additional factor. We also obtain the asymptotic normality of the MLE. Numerical studies and a data analysis demonstrate our theoretical findings. ) for b{theta} and Op(log n/n) for b{gamma}, up to an additional factor. We also obtain the asymptotic normality of the MLE. Numerical studies and a data analysis demonstrate our theoretical findings.
We are concerned here with unrestricted maximum likelihood estimation in a sparse $p_0$ model with covariates for directed networks. The model has a density parameter $ u$, a $2n$-dimensional node parameter $bs{eta}$ and a fixed dimensional regression coefficient $bs{gamma}$ of covariates. Previous studies focus on the restricted likelihood inference. When the number of nodes $n$ goes to infinity, we derive the $ell_infty$-error between the maximum likelihood estimator (MLE) $(widehat{bs{eta}}, widehat{bs{gamma}})$ and its true value $(bs{eta}, bs{gamma})$. They are $O_p( (log n/n)^{1/2} )$ for $widehat{bs{eta}}$ and $O_p( log n/n)$ for $widehat{bs{gamma}}$, up to an additional factor. This explains the asymptotic bias phenomenon in the asymptotic normality of $widehat{bs{gamma}}$ in cite{Yan-Jiang-Fienberg-Leng2018}. Further, we derive the asymptotic normality of the MLE. Numerical studies and a data analysis demonstrate our theoretical findings.
Common datasets have the form of elements with keys (e.g., transactions and products) and the goal is to perform analytics on the aggregated form of key and frequency pairs. A weighted sample of keys by (a function of) frequency is a highly versatile summary that provides a sparse set of representative keys and supports approximate evaluations of query statistics. We propose private weighted sampling (PWS): A method that ensures element-level differential privacy while retaining, to the extent possible, the utility of a respective non-private weighted sample. PWS maximizes the reporting probabilities of keys and estimation quality of a broad family of statistics. PWS improves over the state of the art also for the well-studied special case of private histograms, when no sampling is performed. We empirically demonstrate significant performance gains compared with prior baselines: 20%-300% increase in key reporting for common Zipfian frequency distributions and accuracy for $times 2$-$ 8$ lower frequencies in estimation tasks. Moreover, PWS is applied as a simple post-processing of a non-private sample, without requiring the original data. This allows for seamless integration with existing implementations of non-private schemes and retaining the efficiency of schemes designed for resource-constrained settings such as massive distributed or streamed data. We believe that due to practicality and performance, PWS may become a method of choice in applications where privacy is desired.
Differential privacy provides a rigorous framework for privacy-preserving data analysis. This paper proposes the first differentially private procedure for controlling the false discovery rate (FDR) in multiple hypothesis testing. Inspired by the Benjamini-Hochberg procedure (BHq), our approach is to first repeatedly add noise to the logarithms of the $p$-values to ensure differential privacy and to select an approximately smallest $p$-value serving as a promising candidate at each iteration; the selected $p$-values are further supplied to the BHq and our private procedure releases only the rejected ones. Moreover, we develop a new technique that is based on a backward submartingale for proving FDR control of a broad class of multiple testing procedures, including our private procedure, and both the BHq step-up and step-down procedures. As a novel aspect, the proof works for arbitrary dependence between the true null and false null test statistics, while FDR control is maintained up to a small multiplicative factor.
In this paper, we introduce a new three-parameter distribution based on the combination of re-parametrization of the so-called EGNB2 and transmuted exponential distributions. This combination aims to modify the transmuted exponential distribution via the incorporation of an additional parameter, mainly adding a high degree of flexibility on the mode and impacting the skewness and kurtosis of the tail. We explore some mathematical properties of this distribution including the hazard rate function, moments, the moment generating function, the quantile function, various entropy measures and (reversed) residual life functions. A statistical study investigates estimation of the parameters using the method of maximum likelihood. The distribution along with other existing distributions are fitted to two environmental data sets and its superior performance is assessed by using some goodness-of-fit tests. As a result, some environmental measures associated with these data are obtained such as the return level and mean deviation about this level.