ﻻ يوجد ملخص باللغة العربية
The Consent-to-Contact (C2C) registry at the University of California, Irvine collects data from community participants to aid in the recruitment to clinical research studies. Self-selection into the C2C likely leads to bias due in part to enrollees having more years of education relative to the US general population. Salazar et al. (2020) recently used the C2C to examine associations of race/ethnicity with participant willingness to be contacted about research studies. To address questions about generalizability of estimated associations we estimate propensity for self-selection into the convenience sample weights using data from the National Health and Nutrition Examination Survey (NHANES). We create a combined dataset of C2C and NHANES subjects and compare different approaches (logistic regression, covariate balancing propensity score, entropy balancing, and random forest) for estimating the probability of membership in C2C relative to NHANES. We propose methods to estimate the variance of parameter estimates that account for uncertainty that arises from estimating propensity weights. Simulation studies explore the impact of propensity weight estimation on uncertainty. We demonstrate the approach by repeating the analysis by Salazar et al. with the deduced propensity weights for the C2C subjects and contrast the results of the two analyses. This method can be implemented using our estweight package in R available on GitHub.
The inverse probability weighting approach is popular for evaluating treatment effects in observational studies, but extreme propensity scores could bias the estimator and induce excessive variance. Recently, the overlap weighting approach has been p
Hierarchical inference in (generalized) regression problems is powerful for finding significant groups or even single covariates, especially in high-dimensional settings where identifiability of the entire regression parameter vector may be ill-posed
A straightforward application of semi-supervised machine learning to the problem of treatment effect estimation would be to consider data as unlabeled if treatment assignment and covariates are observed but outcomes are unobserved. According to this
The increasing prevalence of rich sources of data and the availability of electronic medical record databases and electronic registries opens tremendous opportunities for enhancing medical research. For example, controlled trials are ubiquitously use
The popularity of online surveys has increased the prominence of using weights that capture units probabilities of inclusion for claims of representativeness. Yet, much uncertainty remains regarding how these weights should be employed in the analysi