ترغب بنشر مسار تعليمي؟ اضغط هنا

Using Missing Types to Improve Partial Identification with Application to a Study of HIV Prevalence in Malawi

70   0   0.0 ( 0 )
 نشر من قبل Zhichao Jiang
 تاريخ النشر 2016
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Frequently, empirical studies are plagued with missing data. When the data are missing not at random, the parameter of interest is not identifiable in general. Without additional assumptions, we can derive bounds of the parameters of interest, which, unfortunately, are often too wide to be informative. Therefore, it is of great importance to sharpen these worst-case bounds by exploiting additional information. Traditional missing data analysis uses only the information of the binary missing data indicator, that is, a certain data point is either missing or not. Nevertheless, real data often provide more information than a binary missing data indicator, and they often record different types of missingness. In a motivating HIV status survey, missing data may be due to the units unwillingness to respond to the survey items or their hospitalization during the visit, and may also be due to the units temporarily absence or relocation. It is apparent that some missing types are more likely to be missing not at random, but other missing types are more likely to be missing at random. We show that making full use of the missing types results in narrower bounds of the parameters of interest. In a real-life example, we demonstrate substantial improvement of more than 50% reduction in bound widths for estimating the prevalence of HIV in rural Malawi. As we illustrate using the HIV study, our strategy is also useful for conducting sensitivity analysis by gradually increasing or decreasing the set of types that are missing at random. In addition, we propose an easy-to-implement method to construct confidence intervals for partially identified parameters with bounds expressed as the minimums and maximums of finite parameters, which is useful for not only our problem but also many other problems involving bounds.



قيم البحث

اقرأ أيضاً

A MICROMEGAS detection amplifier has been incorporated into the design of the TAMU MDM focal plane detector with the purpose of improving the energy resolution and thus, the particle identification. Beam tests showed a factor of 2 improvement over th e original design, from 10-12% to 4-6%, for ions with A<40 at E/A around 10-20 MeV.
149 - Takuya Ishihara 2017
In this study, we explore the partial identification of nonseparable models with continuous endogenous and binary instrumental variables. We show that the structural function is partially identified when it is monotone or concave in the explanatory v ariable. DHaultfoeuille and Fevrier (2015) and Torgovitsky (2015) prove the point identification of the structural function under a key assumption that the conditional distribution functions of the endogenous variable for different values of the instrumental variables have intersections. We demonstrate that, even if this assumption does not hold, monotonicity and concavity provide identifying power. Point identification is achieved when the structural function is flat or linear with respect to the explanatory variable over a given interval. We compute the bounds using real data and show that our bounds are informative.
In the field of materials science and engineering, statistical analysis and machine learning techniques have recently been used to predict multiple material properties from an experimental design. These material properties correspond to response vari ables in the multivariate regression model. This study conducts a penalized maximum likelihood procedure to estimate model parameters, including the regression coefficients and covariance matrix of response variables. In particular, we employ $l_1$-regularization to achieve a sparse estimation of regression coefficients and the inverse covariance matrix of response variables. In some cases, there may be a relatively large number of missing values in response variables, owing to the difficulty in collecting data on material properties. A method to improve prediction accuracy under the situation with missing values incorporates a correlation structure among the response variables into the statistical model. The expectation and maximization algorithm is constructed, which enables application to a data set with missing values in the responses. We apply our proposed procedure to real data consisting of 22 material properties.
As the most important tool to provide high-level evidence-based medicine, researchers can statistically summarize and combine data from multiple studies by conducting meta-analysis. In meta-analysis, mean differences are frequently used effect size m easurements to deal with continuous data, such as the Cohens d statistic and Hedges g statistic values. To calculate the mean difference based effect sizes, the sample mean and standard deviation are two essential summary measures. However, many of the clinical reports tend not to directly record the sample mean and standard deviation. Instead, the sample size, median, minimum and maximum values and/or the first and third quartiles are reported. As a result, researchers have to transform the reported information to the sample mean and standard deviation for further compute the effect size. Since most of the popular transformation methods were developed upon the normality assumption of the underlying data, it is necessary to perform a pre-test before transforming the summary statistics. In this article, we had introduced test statistics for three popular scenarios in meta-analysis. We suggests medical researchers to perform a normality test of the selected studies before using them to conduct further analysis. Moreover, we applied three different case studies to demonstrate the usage of the newly proposed test statistics. The real data case studies indicate that the new test statistics are easy to apply in practice and by following the recommended path to conduct the meta-analysis, researchers can obtain more reliable conclusions.
HIV-1C is the most prevalent subtype of HIV-1 and accounts for over half of HIV-1 infections worldwide. Host genetic influence of HIV infection has been previously studied in HIV-1B, but little attention has been paid to the more prevalent subtype C. To understand the role of host genetics in HIV-1C disease progression, we perform a study to assess the association between longitudinally collected measures of disease and more than 100,000 genetic markers located on chromosome 6. The most common approach to analyzing longitudinal data in this context is linear mixed effects models, which may be overly simplistic in this case. On the other hand, existing non-parametric methods may suffer from low power due to high degrees of freedom (DF) and may be computationally infeasible at the large scale. We propose a functional principal variance component (FPVC) testing framework which captures the nonlinearity in the CD4 and viral load with potentially low DF and is fast enough to carry out thousands or millions of times. The FPVC testing unfolds in two stages. In the first stage, we summarize the markers of disease progression according to their major patterns of variation via functional principal components analysis (FPCA). In the second stage, we employ a simple working model and variance component testing to examine the association between the summaries of disease progression and a set of single nucleotide polymorphisms. We supplement this analysis with simulation results which indicate that FPVC testing can offer large power gains over the standard linear mixed effects model.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا