No Arabic abstract
The article considers the problem of estimating a high-dimensional sparse parameter in the presence of side information that encodes the sparsity structure. We develop a general framework that involves first using an auxiliary sequence to capture the side information, and then incorporating the auxiliary sequence in inference to reduce the estimation risk. The proposed method, which carries out adaptive SURE-thresholding using side information (ASUS), is shown to have robust performance and enjoy optimality properties. We develop new theories to characterize regimes in which ASUS far outperforms competitive shrinkage estimators, and establish precise conditions under which ASUS is asymptotically optimal. Simulation studies are conducted to show that ASUS substantially improves the performance of existing methods in many settings. The methodology is applied for analysis of data from single cell virology studies and microarray time course experiments.
Adaptive multiple testing with covariates is an important research direction that has gained major attention in recent years. It has been widely recognized that leveraging side information provided by auxiliary covariates can improve the power of false discovery rate (FDR) procedures. Currently, most such procedures are devised with $p$-values as their main statistics. However, for two-sided hypotheses, the usual data processing step that transforms the primary statistics, known as $z$-values, into $p$-values not only leads to a loss of information carried by the main statistics, but can also undermine the ability of the covariates to assist with the FDR inference. We develop a $z$-value based covariate-adaptive (ZAP) methodology that operates on the intact structural information encoded jointly by the $z$-values and covariates. It seeks to emulate the oracle $z$-value procedure via a working model, and its rejection regions significantly depart from those of the $p$-value adaptive testing approaches. The key strength of ZAP is that the FDR control is guaranteed with minimal assumptions, even when the working model is misspecified. We demonstrate the state-of-the-art performance of ZAP using both simulated and real data, which shows that the efficiency gain can be substantial in comparison with $p$-value based methods. Our methodology is implemented in the $texttt{R}$ package $texttt{zap}$.
Estimation of a precision matrix (i.e., inverse covariance matrix) is widely used to exploit conditional independence among continuous variables. The influence of abnormal observations is exacerbated in a high dimensional setting as the dimensionality increases. In this work, we propose robust estimation of the inverse covariance matrix based on an $l_1$ regularized objective function with a weighted sample covariance matrix. The robustness of the proposed objective function can be justified by a nonparametric technique of the integrated squared error criterion. To address the non-convexity of the objective function, we develop an efficient algorithm in a similar spirit of majorization-minimization. Asymptotic consistency of the proposed estimator is also established. The performance of the proposed method is compared with several existing approaches via numerical simulations. We further demonstrate the merits of the proposed method with application in genetic network inference.
One central goal of design of observational studies is to embed non-experimental data into an approximate randomized controlled trial using statistical matching. Researchers then make the randomization assumption in their downstream, outcome analysis. For matched pair design, the randomization assumption states that the treatment assignment across all matched pairs are independent, and that the probability of the first subject in each pair receiving treatment and the other control is the same as the first receiving control and the other treatment. In this article, we develop a novel framework for testing the randomization assumption based on solving a clustering problem with side-information using modern statistical learning tools. Our testing framework is nonparametric, finite-sample exact, and distinct from previous proposals in that it can be used to test a relaxed version of the randomization assumption called the biased randomization assumption. One important by-product of our testing framework is a quantity called residual sensitivity value (RSV), which quantifies the level of minimal residual confounding due to observed covariates not being well matched. We advocate taking into account RSV in the downstream primary analysis. The proposed methodology is illustrated by re-examining a famous observational study concerning the effect of right heart catheterization (RHC) in the initial care of critically ill patients.
In this paper, we consider the distributed mean estimation problem where the server has access to some side information, e.g., its local computed mean estimation or the received information sent by the distributed clients at the previous iterations. We propose a practical and efficient estimator based on an r-bit Wynzer-Ziv estimator proposed by Mayekar et al., which requires no probabilistic assumption on the data. Unlike Mayekars work which only utilizes side information at the server, our scheme jointly exploits the correlation between clients data and server s side information, and also between data of different clients. We derive an upper bound of the estimation error of the proposed estimator. Based on this upper bound, we provide two algorithms on how to choose input parameters for the estimator. Finally, parameter regions in which our estimator is better than the previous one are characterized.
A great challenge to steganography has arisen with the wide application of steganalysis methods based on convolutional neural networks (CNNs). To this end, embedding cost learning frameworks based on generative adversarial networks (GANs) have been proposed and achieved success for spatial steganography. However, the application of GAN to JPEG steganography is still in the prototype stage; its anti-detectability and training efficiency should be improved. In conventional steganography, research has shown that the side-information calculated from the precover can be used to enhance security. However, it is hard to calculate the side-information without the spatial domain image. In this work, an embedding cost learning framework for JPEG Steganography via a Generative Adversarial Network (JS-GAN) has been proposed, the learned embedding cost can be further adjusted asymmetrically according to the estimated side-information. Experimental results have demonstrated that the proposed method can automatically learn a content-adaptive embedding cost function, and use the estimated side-information properly can effectively improve the security performance. For example, under the attack of a classic steganalyzer GFR with quality factor 75 and 0.4 bpnzAC, the proposed JS-GAN can increase the detection error 2.58% over J-UNIWARD, and the estimated side-information aided version JS-GAN(ESI) can further increase the security performance by 11.25% over JS-GAN.