New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Large Datasets, Bias and Model Oriented Optimal Design of Experiments

72 0 0.0 ( 0 )

Download Cite

Added by Elena Pesce

Publication date 2018

fields Mathematical Statistics

and research's language is English

Authors Elena Pesce - Eva Riccomagno

Methodology Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We review recent literature that proposes to adapt ideas from classical model based optimal design of experiments to problems of data selection of large datasets. Special attention is given to bias reduction and to protection against confounders. Some new results are presented. Theoretical and computational comparisons are made.

rate research

Robust multi-stage model-based design of optimal experiments for nonlinear estimation

157 - Anwesh Reddy Gottu Mukkula , Michal Mateav{s} , Miroslav Fikar 2020

We study approaches to robust model-based design of experiments in the context of maximum-likelihood estimation. These approaches provide robustification of model-based methodologies for the design of optimal experiments by accounting for the effect of the parametric uncertainty. We study the problem of robust optimal design of experiments in the framework of nonlinear least-squares parameter estimation using linearized confidence regions. We investigate several well-known robustification frameworks in this respect and propose a novel methodology based on multi-stage robust optimization. The proposed methodology aims at problems, where the experiments are designed sequentially with a possibility of re-estimation in-between the experiments. The multi-stage formalism aids in identifying experiments that are better conducted in the early phase of experimentation, where parameter knowledge is poor. We demonstrate the findings and effectiveness of the proposed methodology using four case studies of varying complexity.

Methodology Machine Learning Systems and Control

Trimmed Match Design for Randomized Paired Geo Experiments

167 - Aiyou Chen , Marco Longfils , Nicolas Remy 2021

How to measure the incremental Return On Ad Spend (iROAS) is a fundamental problem for the online advertising industry. A standard modern tool is to run randomized geo experiments, where experimental units are non-overlapping ad-targetable geographical areas (Vaver & Koehler 2011). However, how to design a reliable and cost-effective geo experiment can be complicated, for example: 1) the number of geos is often small, 2) the response metric (e.g. revenue) across geos can be very heavy-tailed due to geo heterogeneity, and furthermore 3) the response metric can vary dramatically over time. To address these issues, we propose a robust nonparametric method for the design, called Trimmed Match Design (TMD), which extends the idea of Trimmed Match (Chen & Au 2019) and furthermore integrates the techniques of optimal subset pairing and sample splitting in a novel and systematic manner. Some simulation and real case studies are presented. We also point out a few open problems for future research.

Methodology Applications

Optimal integrating learning for split questionnaire design type data

143 - Cunjie Lin , Jingfu Peng , Yichen Qin 2021

In the era of data science, it is common to encounter data with different subsets of variables obtained for different cases. An example is the split questionnaire design (SQD), which is adopted to reduce respondent fatigue and improve response rates by assigning different subsets of the questionnaire to different sampled respondents. A general question then is how to estimate the regression function based on such block-wise observed data. Currently, this is often carried out with the aid of missing data methods, which may unfortunately suffer intensive computational cost, high variability, and possible large modeling biases in real applications. In this article, we develop a novel approach for estimating the regression function for SQD-type data. We first construct a list of candidate models using available data-blocks separately, and then combine the estimates properly to make an efficient use of all the information. We show the resulting averaged model is asymptotically optimal in the sense that the squared loss and risk are asymptotically equivalent to those of the best but infeasible averaged estimator. Both simulated examples and an application to the SQD dataset from the European Social Survey show the promise of the proposed method.

Methodology Applications

Bridging preference-based instrumental variable studies and cluster-randomized encouragement experiments: study design, noncompliance, and average cluster effect ratio

73 - Bo Zhang , Siyu Heng , Emily J. MacKay 2020

Instrumental variable methods are widely used in medical and social science research to draw causal conclusions when the treatment and outcome are confounded by unmeasured confounding variables. One important feature of such studies is that the instrumental variable is often applied at the cluster level, e.g., hospitals or physicians preference for a certain treatment where each hospital or physician naturally defines a cluster. This paper proposes to embed such observational instrumental variable data into a cluster-randomized encouragement experiment using statistical matching. Potential outcomes and causal assumptions underpinning the design are formalized and examined. Testing procedures for two commonly-used estimands, Fishers sharp null hypothesis and the pooled effect ratio, are extended to the current setting. We then introduce a novel cluster-heterogeneous proportional treatment effect model and the relevant estimand: the average cluster effect ratio. This new estimand is advantageous over the structural parameter in a constant proportional treatment effect model in that it allows treatment heterogeneity, and is advantageous over the pooled effect ratio estimand in that it is immune to Simpsons paradox. We develop an asymptotically valid randomization-based testing procedure for this new estimand based on solving a mixed integer quadratically-constrained optimization problem. The proposed design and inferential methods are applied to a study of the effect of using transesophageal echocardiography during CABG surgery on patients 30-day mortality rate.

Methodology Applications

Optimal Bayesian hierarchical model to accelerate the development of tissue-agnostic drugs and basket trials

78 - Liyun Jiang , Lei Nie , Fangrong Yan 2020

Tissue-agnostic trials enroll patients based on their genetic biomarkers, not tumor type, in an attempt to determine if a new drug can successfully treat disease conditions based on biomarkers. The Bayesian hierarchical model (BHM) provides an attractive approach to design phase II tissue-agnostic trials by allowing information borrowing across multiple disease types. In this article, we elucidate two intrinsic and inevitable issues that may limit the use of BHM to tissue-agnostic trials: sensitivity to the prior specification of the shrinkage parameter and the competing interest among disease types in increasing power and controlling type I error. To address these issues, we propose the optimal BHM (OBHM) approach. With OBHM, we first specify a flexible utility function to quantify the tradeoff between type I error and power across disease type based on the study objectives, and then we select the prior of the shrinkage parameter to optimize the utility function of clinical and regulatory interest. OBMH effectively balances type I and II errors, addresses the sensitivity of the prior selection, and reduces the unwarranted subjectivity in the prior selection. Simulation study shows that the resulting OBHM and its extensions, clustered OBHM (COBHM) and adaptive OBHM (AOBHM), have desirable operating characteristics, outperforming some existing methods with better balanced power and type I error control. Our method provides a systematic, rigorous way to apply BHM and solve the common problem of blindingly using a non-informative inverse-gamma prior (with a large variance) or priors arbitrarily chosen that may lead to pathological statistical properties.

Methodology Applications

comments

Fetching comments

Helwan

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Large Datasets, Bias and Model Oriented Optimal Design of Experiments

Ask ChatGPT about the research

No Arabic abstract

Read More