No Arabic abstract
Factor and sparse models are two widely used methods to impose a low-dimensional structure in high-dimension. They are seemingly mutually exclusive. We propose a lifting method that combines the merits of these two models in a supervised learning methodology that allows to efficiently explore all the information in high-dimensional datasets. The method is based on a flexible model for high-dimensional panel data, called factor-augmented regression (FarmPredict) model with both observable or latent common factors, as well as idiosyncratic components. This model not only includes both principal component (factor) regression and sparse regression as specific models but also significantly weakens the cross-sectional dependence and hence facilitates model selection and interpretability. The methodology consists of three steps. At each step, the remaining cross-section dependence can be inferred by a novel test for covariance structure in high-dimensions. We developed asymptotic theory for the FarmPredict model and demonstrated the validity of the multiplier bootstrap for testing high-dimensional covariance structure. This is further extended to testing high-dimensional partial covariance structures. The theory is supported by a simulation study and applications to the construction of a partial covariance network of the financial returns and a prediction exercise for a large panel of macroeconomic time series from FRED-MD database.
This paper studies the estimation of network connectedness with focally sparse structure. We try to uncover the network effect with a flexible sparse deviation from a predetermined adjacency matrix. To be more specific, the sparse deviation structure can be regarded as latent or misspecified linkages. To obtain high-quality estimator for parameters of interest, we propose to use a double regularized high-dimensional generalized method of moments (GMM) framework. Moreover, this framework also facilitates us to conduct the inference. Theoretical results on consistency and asymptotic normality are provided with accounting for general spatial and temporal dependency of the underlying data generating processes. Simulations demonstrate good performance of our proposed procedure. Finally, we apply the methodology to study the spatial network effect of stock returns.
The Pareto model is very popular in risk management, since simple analytical formulas can be derived for financial downside risk measures (Value-at-Risk, Expected Shortfall) or reinsurance premiums and related quantities (Large Claim Index, Return Period). Nevertheless, in practice, distributions are (strictly) Pareto only in the tails, above (possible very) large threshold. Therefore, it could be interesting to take into account second order behavior to provide a better fit. In this article, we present how to go from a strict Pareto model to Pareto-type distributions. We discuss inference, and derive formulas for various measures and indices, and finally provide applications on insurance losses and financial risks.
This paper proposes a logistic undirected network formation model which allows for assortative matching on observed individual characteristics and the presence of edge-wise fixed effects. We model the coefficients of observed characteristics to have a latent community structure and the edge-wise fixed effects to be of low rank. We propose a multi-step estimation procedure involving nuclear norm regularization, sample splitting, iterative logistic regression and spectral clustering to detect the latent communities. We show that the latent communities can be exactly recovered when the expected degree of the network is of order log n or higher, where n is the number of nodes in the network. The finite sample performance of the new estimation and inference methods is illustrated through both simulated and real datasets.
We consider identification and estimation of nonseparable sample selection models with censored selection rules. We employ a control function approach and discuss different objects of interest based on (1) local effects conditional on the control function, and (2) global effects obtained from integration over ranges of values of the control function. We derive the conditions for the identification of these different objects and suggest strategies for estimation. Moreover, we provide the associated asymptotic theory. These strategies are illustrated in an empirical investigation of the determinants of female wages in the United Kingdom.
In nonseparable triangular models with a binary endogenous treatment and a binary instrumental variable, Vuong and Xu (2017) show that the individual treatment effects (ITEs) are identifiable. Feng, Vuong and Xu (2019) show that a kernel density estimator that uses nonparametrically estimated ITEs as observations is uniformly consistent for the density of the ITE. In this paper, we establish the asymptotic normality of the density estimator of Feng, Vuong and Xu (2019) and show that despite their faster rate of convergence, ITEs estimation errors have a non-negligible effect on the asymptotic distribution of the density estimator. We propose asymptotically valid standard errors for the density of the ITE that account for estimated ITEs as well as bias correction. Furthermore, we develop uniform confidence bands for the density of the ITE using nonparametric or jackknife multiplier bootstrap critical values. Our uniform confidence bands have correct coverage probabilities asymptotically with polynomial error rates and can be used for inference on the shape of the ITEs distribution.