No Arabic abstract
For regulatory and interpretability reasons, logistic regression is still widely used. To improve prediction accuracy and interpretability, a preprocessing step quantizing both continuous and categorical data is usually performed: continuous features are discretized and, if numerous, levels of categorical features are grouped. An even better predictive accuracy can be reached by embedding this quantization estimation step directly into the predictive estimation step itself. But doing so, the predictive loss has to be optimized on a huge set. To overcome this difficulty, we introduce a specific two-step optimization strategy: first, the optimization problem is relaxed by approximating discontinuous quantization functions by smooth functions; second, the resulting relaxed optimization problem is solved via a particular neural network. The good performances of this approach, which we call glmdisc, are illustrated on simulated and real data from the UCI library and Credit Agricole Consumer Finance (a major European historic player in the consumer credit market).
The aim of this paper is to present a mixture composite regression model for claim severity modelling. Claim severity modelling poses several challenges such as multimodality, heavy-tailedness and systematic effects in data. We tackle this modelling problem by studying a mixture composite regression model for simultaneous modeling of attritional and large claims, and for considering systematic effects in both the mixture components as well as the mixing probabilities. For model fitting, we present a group-fused regularization approach that allows us for selecting the explanatory variables which significantly impact the mixing probabilities and the different mixture components, respectively. We develop an asymptotic theory for this regularized estimation approach, and fitting is performed using a novel Generalized Expectation-Maximization algorithm. We exemplify our approach on real motor insurance data set.
We develop a novel decouple-recouple dynamic predictive strategy and contribute to the literature on forecasting and economic decision making in a data-rich environment. Under this framework, clusters of predictors generate different latent states in the form of predictive densities that are later synthesized within an implied time-varying latent factor model. As a result, the latent inter-dependencies across predictive densities and biases are sequentially learned and corrected. Unlike sparse modeling and variable selection procedures, we do not assume a priori that there is a given subset of active predictors, which characterize the predictive density of a quantity of interest. We test our procedure by investigating the predictive content of a large set of financial ratios and macroeconomic variables on both the equity premium across different industries and the inflation rate in the U.S., two contexts of topical interest in finance and macroeconomics. We find that our predictive synthesis framework generates both statistically and economically significant out-of-sample benefits while maintaining interpretability of the forecasting variables. In addition, the main empirical results highlight that our proposed framework outperforms both LASSO-type shrinkage regressions, factor based dimension reduction, sequential variable selection, and equal-weighted linear pooling methodologies.
Factor structures or interactive effects are convenient devices to incorporate latent variables in panel data models. We consider fixed effect estimation of nonlinear panel single-index models with factor structures in the unobservables, which include logit, probit, ordered probit and Poisson specifications. We establish that fixed effect estimators of model parameters and average partial effects have normal distributions when the two dimensions of the panel grow large, but might suffer of incidental parameter bias. We show how models with factor structures can also be applied to capture important features of network data such as reciprocity, degree heterogeneity, homophily in latent variables and clustering. We illustrate this applicability with an empirical example to the estimation of a gravity equation of international trade between countries using a Poisson model with multiple factors.
We present the Stata commands probitfe and logitfe, which estimate probit and logit panel data models with individual and/or time unobserved effects. Fixed effect panel data methods that estimate the unobserved effects can be severely biased because of the incidental parameter problem (Neyman and Scott, 1948). We tackle this problem by using the analytical and jackknife bias corrections derived in Fernandez-Val and Weidner (2016) for panels where the two dimensions ($N$ and $T$) are moderately large. We illustrate the commands with an empirical application to international trade and a Monte Carlo simulation calibrated to this application.
In this study, we develop a novel estimation method of the quantile treatment effects (QTE) under the rank invariance and rank stationarity assumptions. Ishihara (2020) explores identification of the nonseparable panel data model under these assumptions and propose a parametric estimation based on the minimum distance method. However, the minimum distance estimation using this process is computationally demanding when the dimensionality of covariates is large. To overcome this problem, we propose a two-step estimation method based on the quantile regression and minimum distance method. We then show consistency and asymptotic normality of our estimator. Monte Carlo studies indicate that our estimator performs well in finite samples. Last, we present two empirical illustrations, to estimate the distributional effects of insurance provision on household production and of TV watching on child cognitive development.