No Arabic abstract
Selecting the optimal Markowitz porfolio depends on estimating the covariance matrix of the returns of $N$ assets from $T$ periods of historical data. Problematically, $N$ is typically of the same order as $T$, which makes the sample covariance matrix estimator perform poorly, both empirically and theoretically. While various other general purpose covariance matrix estimators have been introduced in the financial economics and statistics literature for dealing with the high dimensionality of this problem, we here propose an estimator that exploits the fact that assets are typically positively dependent. This is achieved by imposing that the joint distribution of returns be multivariate totally positive of order 2 ($text{MTP}_2$). This constraint on the covariance matrix not only enforces positive dependence among the assets, but also regularizes the covariance matrix, leading to desirable statistical properties such as sparsity. Based on stock-market data spanning over thirty years, we show that estimating the covariance matrix under $text{MTP}_2$ outperforms previous state-of-the-art methods including shrinkage estimators and factor models.
During the last few decades, online controlled experiments (also known as A/B tests) have been adopted as a golden standard for measuring business improvements in industry. In our company, there are more than a billion users participating in thousands of experiments simultaneously, and with statistical inference and estimations conducted to thousands of online metrics in those experiments routinely, computational costs would become a large concern. In this paper we propose a novel algorithm for estimating the covariance of online metrics, which introduces more flexibility to the trade-off between computational costs and precision in covariance estimation. This covariance estimation method reduces computational cost of metric calculation in large-scale setting, which facilitates further application in both online controlled experiments and adaptive experiments scenarios like variance reduction, continuous monitoring, Bayesian optimization, etc., and it can be easily implemented in engineering practice.
We consider high-dimensional measurement errors with high-frequency data. Our focus is on recovering the covariance matrix of the random errors with optimality. In this problem, not all components of the random vector are observed at the same time and the measurement errors are latent variables, leading to major challenges besides high data dimensionality. We propose a new covariance matrix estimator in this context with appropriate localization and thresholding. By developing a new technical device integrating the high-frequency data feature with the conventional notion of $alpha$-mixing, our analysis successfully accommodates the challenging serial dependence in the measurement errors. Our theoretical analysis establishes the minimax optimal convergence rates associated with two commonly used loss functions. We then establish cases when the proposed localized estimator with thresholding achieves the minimax optimal convergence rates. Considering that the variances and covariances can be small in reality, we conduct a second-order theoretical analysis that further disentangles the dominating bias in the estimator. A bias-corrected estimator is then proposed to ensure its practical finite sample performance. We illustrate the promising empirical performance of the proposed estimator with extensive simulation studies and a real data analysis.
In a Gaussian graphical model, the conditional independence between two variables are characterized by the corresponding zero entries in the inverse covariance matrix. Maximum likelihood method using the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li, 2001) and the adaptive LASSO penalty (Zou, 2006) have been proposed in literature. In this article, we establish the result that using Bayesian information criterion (BIC) to select the tuning parameter in penalized likelihood estimation with both types of penalties can lead to consistent graphical model selection. We compare the empirical performance of BIC with cross validation method and demonstrate the advantageous performance of BIC criterion for tuning parameter selection through simulation studies.
In this paper, we estimate the high dimensional precision matrix under the weak sparsity condition where many entries are nearly zero. We study a Lasso-type method for high dimensional precision matrix estimation and derive general error bounds under the weak sparsity condition. The common irrepresentable condition is relaxed and the results are applicable to the weak sparse matrix. As applications, we study the precision matrix estimation for the heavy-tailed data, the non-paranormal data, and the matrix data with the Lasso-type method.
We propose a novel pilot structure for covariance matrix estimation in massive multiple-input multiple-output (MIMO) systems in which each user transmits two pilot sequences, with the second pilot sequence multiplied by a random phase-shift. The covariance matrix of a particular user is obtained by computing the sample cross-correlation of the channel estimates obtained from the two pilot sequences. This approach relaxes the requirement that all the users transmit their uplink pilots over the same set of symbols. We derive expressions for the achievable rate and the mean-squared error of the covariance matrix estimate when the proposed method is used with staggered pilots. The performance of the proposed method is compared with existing methods through simulations.