ترغب بنشر مسار تعليمي؟ اضغط هنا

Large Datasets, Bias and Model Oriented Optimal Design of Experiments

72   0   0.0 ( 0 )
 نشر من قبل Elena Pesce
 تاريخ النشر 2018
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We review recent literature that proposes to adapt ideas from classical model based optimal design of experiments to problems of data selection of large datasets. Special attention is given to bias reduction and to protection against confounders. Some new results are presented. Theoretical and computational comparisons are made.



قيم البحث

اقرأ أيضاً

We study approaches to robust model-based design of experiments in the context of maximum-likelihood estimation. These approaches provide robustification of model-based methodologies for the design of optimal experiments by accounting for the effect of the parametric uncertainty. We study the problem of robust optimal design of experiments in the framework of nonlinear least-squares parameter estimation using linearized confidence regions. We investigate several well-known robustification frameworks in this respect and propose a novel methodology based on multi-stage robust optimization. The proposed methodology aims at problems, where the experiments are designed sequentially with a possibility of re-estimation in-between the experiments. The multi-stage formalism aids in identifying experiments that are better conducted in the early phase of experimentation, where parameter knowledge is poor. We demonstrate the findings and effectiveness of the proposed methodology using four case studies of varying complexity.
How to measure the incremental Return On Ad Spend (iROAS) is a fundamental problem for the online advertising industry. A standard modern tool is to run randomized geo experiments, where experimental units are non-overlapping ad-targetable geographic al areas (Vaver & Koehler 2011). However, how to design a reliable and cost-effective geo experiment can be complicated, for example: 1) the number of geos is often small, 2) the response metric (e.g. revenue) across geos can be very heavy-tailed due to geo heterogeneity, and furthermore 3) the response metric can vary dramatically over time. To address these issues, we propose a robust nonparametric method for the design, called Trimmed Match Design (TMD), which extends the idea of Trimmed Match (Chen & Au 2019) and furthermore integrates the techniques of optimal subset pairing and sample splitting in a novel and systematic manner. Some simulation and real case studies are presented. We also point out a few open problems for future research.
In the era of data science, it is common to encounter data with different subsets of variables obtained for different cases. An example is the split questionnaire design (SQD), which is adopted to reduce respondent fatigue and improve response rates by assigning different subsets of the questionnaire to different sampled respondents. A general question then is how to estimate the regression function based on such block-wise observed data. Currently, this is often carried out with the aid of missing data methods, which may unfortunately suffer intensive computational cost, high variability, and possible large modeling biases in real applications. In this article, we develop a novel approach for estimating the regression function for SQD-type data. We first construct a list of candidate models using available data-blocks separately, and then combine the estimates properly to make an efficient use of all the information. We show the resulting averaged model is asymptotically optimal in the sense that the squared loss and risk are asymptotically equivalent to those of the best but infeasible averaged estimator. Both simulated examples and an application to the SQD dataset from the European Social Survey show the promise of the proposed method.
Instrumental variable methods are widely used in medical and social science research to draw causal conclusions when the treatment and outcome are confounded by unmeasured confounding variables. One important feature of such studies is that the instr umental variable is often applied at the cluster level, e.g., hospitals or physicians preference for a certain treatment where each hospital or physician naturally defines a cluster. This paper proposes to embed such observational instrumental variable data into a cluster-randomized encouragement experiment using statistical matching. Potential outcomes and causal assumptions underpinning the design are formalized and examined. Testing procedures for two commonly-used estimands, Fishers sharp null hypothesis and the pooled effect ratio, are extended to the current setting. We then introduce a novel cluster-heterogeneous proportional treatment effect model and the relevant estimand: the average cluster effect ratio. This new estimand is advantageous over the structural parameter in a constant proportional treatment effect model in that it allows treatment heterogeneity, and is advantageous over the pooled effect ratio estimand in that it is immune to Simpsons paradox. We develop an asymptotically valid randomization-based testing procedure for this new estimand based on solving a mixed integer quadratically-constrained optimization problem. The proposed design and inferential methods are applied to a study of the effect of using transesophageal echocardiography during CABG surgery on patients 30-day mortality rate.
Tissue-agnostic trials enroll patients based on their genetic biomarkers, not tumor type, in an attempt to determine if a new drug can successfully treat disease conditions based on biomarkers. The Bayesian hierarchical model (BHM) provides an attrac tive approach to design phase II tissue-agnostic trials by allowing information borrowing across multiple disease types. In this article, we elucidate two intrinsic and inevitable issues that may limit the use of BHM to tissue-agnostic trials: sensitivity to the prior specification of the shrinkage parameter and the competing interest among disease types in increasing power and controlling type I error. To address these issues, we propose the optimal BHM (OBHM) approach. With OBHM, we first specify a flexible utility function to quantify the tradeoff between type I error and power across disease type based on the study objectives, and then we select the prior of the shrinkage parameter to optimize the utility function of clinical and regulatory interest. OBMH effectively balances type I and II errors, addresses the sensitivity of the prior selection, and reduces the unwarranted subjectivity in the prior selection. Simulation study shows that the resulting OBHM and its extensions, clustered OBHM (COBHM) and adaptive OBHM (AOBHM), have desirable operating characteristics, outperforming some existing methods with better balanced power and type I error control. Our method provides a systematic, rigorous way to apply BHM and solve the common problem of blindingly using a non-informative inverse-gamma prior (with a large variance) or priors arbitrarily chosen that may lead to pathological statistical properties.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا