No Arabic abstract
In this study, we explore the partial identification of nonseparable models with continuous endogenous and binary instrumental variables. We show that the structural function is partially identified when it is monotone or concave in the explanatory variable. DHaultfoeuille and Fevrier (2015) and Torgovitsky (2015) prove the point identification of the structural function under a key assumption that the conditional distribution functions of the endogenous variable for different values of the instrumental variables have intersections. We demonstrate that, even if this assumption does not hold, monotonicity and concavity provide identifying power. Point identification is achieved when the structural function is flat or linear with respect to the explanatory variable over a given interval. We compute the bounds using real data and show that our bounds are informative.
This paper explores the identification and estimation of nonseparable panel data models. We show that the structural function is nonparametrically identified when it is strictly increasing in a scalar unobservable variable, the conditional distributions of unobservable variables do not change over time, and the joint support of explanatory variables satisfies some weak assumptions. To identify the target parameters, existing studies assume that the structural function does not change over time, and that there are stayers, namely individuals with the same regressor values in two time periods. Our approach, by contrast, allows the structural function to depend on the time period in an arbitrary manner and does not require the existence of stayers. In estimation part of the paper, we consider parametric models and develop an estimator that implements our identification results. We then show the consistency and asymptotic normality of our estimator. Monte Carlo studies indicate that our estimator performs well in finite samples. Finally, we extend our identification results to models with discrete outcomes, and show that the structural function is partially identified.
Nonseparable panel models are important in a variety of economic settings, including discrete choice. This paper gives identification and estimation results for nonseparable models under time homogeneity conditions that are like time is randomly assigned or time is an instrument. Partial identification results for average and quantile effects are given for discrete regressors, under static or dynamic conditions, in fully nonparametric and in semiparametric models, with time effects. It is shown that the usual, linear, fixed-effects estimator is not a consistent estimator of the identified average effect, and a consistent estimator is given. A simple estimator of identified quantile treatment effects is given, providing a solution to the important problem of estimating quantile treatment effects from panel data. Bounds for overall effects in static and dynamic models are given. The dynamic bounds provide a partial identification solution to the important problem of estimating the effect of state dependence in the presence of unobserved heterogeneity. The impact of $T$, the number of time periods, is shown by deriving shrinkage rates for the identified set as $T$ grows. We also consider semiparametric, discrete-choice models and find that semiparametric panel bounds can be much tighter than nonparametric bounds. Computationally-convenient methods for semiparametric models are presented. We propose a novel inference method that applies in panel data and other settings and show that it produces uniformly valid confidence regions in large samples. We give empirical illustrations.
Frequently, empirical studies are plagued with missing data. When the data are missing not at random, the parameter of interest is not identifiable in general. Without additional assumptions, we can derive bounds of the parameters of interest, which, unfortunately, are often too wide to be informative. Therefore, it is of great importance to sharpen these worst-case bounds by exploiting additional information. Traditional missing data analysis uses only the information of the binary missing data indicator, that is, a certain data point is either missing or not. Nevertheless, real data often provide more information than a binary missing data indicator, and they often record different types of missingness. In a motivating HIV status survey, missing data may be due to the units unwillingness to respond to the survey items or their hospitalization during the visit, and may also be due to the units temporarily absence or relocation. It is apparent that some missing types are more likely to be missing not at random, but other missing types are more likely to be missing at random. We show that making full use of the missing types results in narrower bounds of the parameters of interest. In a real-life example, we demonstrate substantial improvement of more than 50% reduction in bound widths for estimating the prevalence of HIV in rural Malawi. As we illustrate using the HIV study, our strategy is also useful for conducting sensitivity analysis by gradually increasing or decreasing the set of types that are missing at random. In addition, we propose an easy-to-implement method to construct confidence intervals for partially identified parameters with bounds expressed as the minimums and maximums of finite parameters, which is useful for not only our problem but also many other problems involving bounds.
We developed a novel approach to identification and model testing in linear structural equation models (SEMs) based on auxiliary variables (AVs), which generalizes a widely-used family of methods known as instrumental variables. The identification problem is concerned with the conditions under which causal parameters can be uniquely estimated from an observational, non-causal covariance matrix. In this paper, we provide an algorithm for the identification of causal parameters in linear structural models that subsumes previous state-of-the-art methods. In other words, our algorithm identifies strictly more coefficients and models than methods previously known in the literature. Our algorithm builds on a graph-theoretic characterization of conditional independence relations between auxiliary and model variables, which is developed in this paper. Further, we leverage this new characterization for allowing identification when limited experimental data or new substantive knowledge about the domain is available. Lastly, we develop a new procedure for model testing using AVs.
Capture-recapture (CRC) surveys are widely used to estimate the size of a population whose members cannot be enumerated directly. When $k$ capture samples are obtained, counts of unit captures in subsets of samples are represented naturally by a $2^k$ contingency table in which one element -- the number of individuals appearing in none of the samples -- remains unobserved. In the absence of additional assumptions, the population size is not point-identified. Assumptions about independence between samples are often used to achieve point-identification. However, real-world CRC surveys often use convenience samples in which independence cannot be guaranteed, and population size estimates under independence assumptions may lack empirical credibility. In this work, we apply the theory of partial identification to show that weak assumptions or qualitative knowledge about the nature of dependence between samples can be used to characterize a non-trivial set in which the true population size lies with high probability. We construct confidence sets for the population size under bounds on pairwise capture probabilities, and bounds on the highest order interaction term in a log-linear model using two methods: test inversion bootstrap confidence intervals, and profile likelihood confidence intervals. We apply these methods to recent survey data to estimate the number of people who inject drugs in Brussels, Belgium.