The odds ratio measure is used in health and social surveys where the odds of a certain event is to be compared between two populations. It is defined using logistic regression, and requires that data from surveys are accompanied by their weights. A nonparametric estimation method that incorporates survey weights and auxiliary information may improve the precision of the odds ratio estimator. It consists in $B$-spline calibration which can handle the nonlinear structure of the parameter. The variance is estimated through linearization. Implementation is possible through standard survey softwares. The gain in precision depends on the data as shown on two examples.
We propose a two parameter ratio-product-ratio estimator for a finite population mean in a simple random sample without replacement following the methodology in Ray and Sahai (1980), Sahai and Ray (1980), Sahai and Sahai (1985) and Singh and Ruiz Espejo (2003). The bias and mean square error of our proposed estimator are obtained to the first degree of approximation. We derive conditions for the parameters under which the proposed estimator has smaller mean square error than the sample mean, ratio and product estimators. We carry out an application showing that the proposed estimator outperforms the traditional estimators using groundwater data taken from a geological site in the state of Florida.
The odds ratio (OR) is a widely used measure of the effect size in observational research. ORs reflect statistical association between a binary outcome, such as the presence of a health condition, and a binary predictor, such as an exposure to a pollutant. Statistical significance and interval estimates are often computed for the logarithm of OR, ln(OR), and depend on the asymptotic standard error of ln(OR). For a sample of size N, the standard error can be written as a ratio of sigma over square root of N, where sigma is the population standard deviation of ln(OR). The ratio of ln(OR) over sigma is a standardized effect size. Unlike correlation, that is another familiar standardized statistic, the standardized ln(OR) cannot reach values of minus one or one. We find that its maximum possible value is given by the Laplace Limit Constant, (LLC=0.6627...), that appears as a condition in solutions to Kepler equation -- one of the central equations in celestial mechanics. The range of the standardized ln(OR) is bounded by minus LLC to LLC, reaching its maximum for ln(OR)~4.7987. This range has implications for analysis of epidemiological associations, affecting the behavior of the reasonable prior distribution for the standardized ln(OR).
During the last decade Levy processes with jumps have received increasing popularity for modelling market behaviour for both derviative pricing and risk management purposes. Chan et al. (2009) introduced the use of empirical likelihood methods to estimate the parameters of various diffusion processes via their characteristic functions which are readily avaiable in most cases. Return series from the market are used for estimation. In addition to the return series, there are many derivatives actively traded in the market whose prices also contain information about parameters of the underlying process. This observation motivates us, in this paper, to combine the return series and the associated derivative prices observed at the market so as to provide a more reflective estimation with respect to the market movement and achieve a gain of effciency. The usual asymptotic properties, including consistency and asymptotic normality, are established under suitable regularity conditions. Simulation and case studies are performed to demonstrate the feasibility and effectiveness of the proposed method.
In causal inference, principal stratification is a framework for dealing with a posttreatment intermediate variable between a treatment and an outcome, in which the principal strata are defined by the joint potential values of the intermediate variable. Because the principal strata are not fully observable, the causal effects within them, also known as the principal causal effects, are not identifiable without additional assumptions. Several previous empirical studies leveraged auxiliary variables to improve the inference of principal causal effects. We establish a general theory for identification and estimation of the principal causal effects with auxiliary variables, which provides a solid foundation for statistical inference and more insights for model building in empirical research. In particular, we consider two commonly-used strategies for principal stratification problems: principal ignorability, and the conditional independence between the auxiliary variable and the outcome given principal strata and covariates. For these two strategies, we give non-parametric and semi-parametric identification results without modeling assumptions on the outcome. When the assumptions for neither strategies are plausible, we propose a large class of flexible parametric and semi-parametric models for identifying principal causal effects. Our theory not only establishes formal identification results of several models that have been used in previous empirical studies but also generalizes them to allow for different types of outcomes and intermediate variables.
The article considers the problem of estimating a high-dimensional sparse parameter in the presence of side information that encodes the sparsity structure. We develop a general framework that involves first using an auxiliary sequence to capture the side information, and then incorporating the auxiliary sequence in inference to reduce the estimation risk. The proposed method, which carries out adaptive SURE-thresholding using side information (ASUS), is shown to have robust performance and enjoy optimality properties. We develop new theories to characterize regimes in which ASUS far outperforms competitive shrinkage estimators, and establish precise conditions under which ASUS is asymptotically optimal. Simulation studies are conducted to show that ASUS substantially improves the performance of existing methods in many settings. The methodology is applied for analysis of data from single cell virology studies and microarray time course experiments.