Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Optimal integrating learning for split questionnaire design type data

144 0 0.0 ( 0 )

Download Cite

Added by Jingfu Peng

Publication date 2021

fields Mathematical Statistics

and research's language is English

Authors Cunjie Lin - Jingfu Peng - Yichen Qin

Methodology Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In the era of data science, it is common to encounter data with different subsets of variables obtained for different cases. An example is the split questionnaire design (SQD), which is adopted to reduce respondent fatigue and improve response rates by assigning different subsets of the questionnaire to different sampled respondents. A general question then is how to estimate the regression function based on such block-wise observed data. Currently, this is often carried out with the aid of missing data methods, which may unfortunately suffer intensive computational cost, high variability, and possible large modeling biases in real applications. In this article, we develop a novel approach for estimating the regression function for SQD-type data. We first construct a list of candidate models using available data-blocks separately, and then combine the estimates properly to make an efficient use of all the information. We show the resulting averaged model is asymptotically optimal in the sense that the squared loss and risk are asymptotically equivalent to those of the best but infeasible averaged estimator. Both simulated examples and an application to the SQD dataset from the European Social Survey show the promise of the proposed method.

rate research

Large Datasets, Bias and Model Oriented Optimal Design of Experiments

71 - Elena Pesce , Eva Riccomagno 2018

We review recent literature that proposes to adapt ideas from classical model based optimal design of experiments to problems of data selection of large datasets. Special attention is given to bias reduction and to protection against confounders. Some new results are presented. Theoretical and computational comparisons are made.

Methodology Applications

Trimmed Match Design for Randomized Paired Geo Experiments

167 - Aiyou Chen , Marco Longfils , Nicolas Remy 2021

How to measure the incremental Return On Ad Spend (iROAS) is a fundamental problem for the online advertising industry. A standard modern tool is to run randomized geo experiments, where experimental units are non-overlapping ad-targetable geographical areas (Vaver & Koehler 2011). However, how to design a reliable and cost-effective geo experiment can be complicated, for example: 1) the number of geos is often small, 2) the response metric (e.g. revenue) across geos can be very heavy-tailed due to geo heterogeneity, and furthermore 3) the response metric can vary dramatically over time. To address these issues, we propose a robust nonparametric method for the design, called Trimmed Match Design (TMD), which extends the idea of Trimmed Match (Chen & Au 2019) and furthermore integrates the techniques of optimal subset pairing and sample splitting in a novel and systematic manner. Some simulation and real case studies are presented. We also point out a few open problems for future research.

Methodology Applications

Bayesian local exchangeability design for phase II basket trials

401 - Yilin Liu , Michael Kane , Denise Esserman 2021

We propose an information borrowing strategy for the design and monitoring of phase II basket trials based on the local multisource exchangeability assumption between baskets (disease types). We construct a flexible statistical design using the proposed strategy. Our approach partitions potentially heterogeneous baskets into non-exchangeable blocks. Information borrowing is only allowed to occur locally, i.e., among similar baskets within the same block. The amount of borrowing is determined by between-basket similarities. The number of blocks and block memberships are inferred from data based on the posterior probability of each partition. The proposed method is compared to the multisource exchangeability model and Simons two-stage design, respectively. In a variety of simulation scenarios, we demonstrate the proposed method is able to maintain the type I error rate and have desirable basket-wise power. In addition, our method is computationally efficient compared to existing Bayesian methods in that the posterior profiles of interest can be derived explicitly without the need for sampling algorithms.

Methodology Applications

Visualizing Outliers in High Dimensional Functional Data for Task fMRI data exploration

220 - Yasser Aleman-Gomez , Manuel Desco (3 2021

Task-based functional magnetic resonance imaging (task fMRI) is a non-invasive technique that allows identifying brain regions whose activity changes when individuals are asked to perform a given task. This contributes to the understanding of how the human brain is organized in functionally distinct subdivisions. Task fMRI experiments from high-resolution scans provide hundred of thousands of longitudinal signals for each individual, corresponding to measurements of brain activity over each voxel of the brain along the duration of the experiment. In this context, we propose some visualization techniques for high dimensional functional data relying on depth-based notions that allow for computationally efficient 2-dim representations of tfMRI data and that shed light on sample composition, outlier presence and individual variability. We believe that this step is crucial previously to any inferential approach willing to identify neuroscientific patterns across individuals, tasks and brain regions. We illustrate the proposed technique through a simulation study and demonstrate its application on a motor and language task fMRI experiment.

Methodology Applications

Bayesian data fusion for unmeasured confounding

83 - Leah Comment , Brent A. Coull , Corwin Zigler 2019

Bayesian causal inference offers a principled approach to policy evaluation of proposed interventions on mediators or time-varying exposures. We outline a general approach to the estimation of causal quantities for settings with time-varying confounding, such as exposure-induced mediator-outcome confounders. We further extend this approach to propose two Bayesian data fusion (BDF) methods for unmeasured confounding. Using informative priors on quantities relating to the confounding bias parameters, our methods incorporate data from an external source where the confounder is measured in order to make inferences about causal estimands in the main study population. We present results from a simulation study comparing our data fusion methods to two common frequentist correction methods for unmeasured confounding bias in the mediation setting. We also demonstrate our method with an investigation of the role of stage at cancer diagnosis in contributing to Black-White colorectal cancer survival disparities.

Methodology Applications

comments

Fetching comments

Al-Etihad University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Optimal integrating learning for split questionnaire design type data

Ask ChatGPT about the research

No Arabic abstract

Read More