No Arabic abstract
Information from various data sources is increasingly available nowadays. However, some of the data sources may produce biased estimation due to commonly encountered biased sampling, population heterogeneity, or model misspecification. This calls for statistical methods to combine information in the presence of biased sources. In this paper, a robust data fusion-extraction method is proposed. The method can produce a consistent estimator of the parameter of interest even if many of the data sources are biased. The proposed estimator is easy to compute and only employs summary statistics, and hence can be applied to many different fields, e.g. meta-analysis, Mendelian randomisation and distributed system. Moreover, the proposed estimator is asymptotically equivalent to the oracle estimator that only uses data from unbiased sources under some mild conditions. Asymptotic normality of the proposed estimator is also established. In contrast to the existing meta-analysis methods, the theoretical properties are guaranteed even if both the number of data sources and the dimension of the parameter diverge as the sample size increases, which ensures the performance of the proposed method over a wide range. The robustness and oracle property is also evaluated via simulation studies. The proposed method is applied to a meta-analysis data set to evaluate the surgical treatment for the moderate periodontal disease, and a Mendelian randomization data set to study the risk factors of head and neck cancer.
Spatial extent inference (SEI) is widely used across neuroimaging modalities to study brain-phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF) based tools can have inflated family-wise error rates (FWERs). This has led to fervent discussion as to which preprocessing steps are necessary to control the FWER using GRF-based SEI. The failure of GRF-based methods is due to unrealistic assumptions about the covariance function of the imaging data. The permutation procedure is the most robust SEI tool because it estimates the covariance function from the imaging data. However, the permutation procedure can fail because its assumption of exchangeability is violated in many imaging modalities. Here, we propose the (semi-) parametric bootstrap joint (PBJ; sPBJ) testing procedures that are designed for SEI of multilevel imaging data. The sPBJ procedure uses a robust estimate of the covariance function, which yields consistent estimates of standard errors, even if the covariance model is misspecified. We use our methods to study the association between performance and executive functioning in a working fMRI study. The sPBJ procedure is robust to variance misspecification and maintains nominal FWER in small samples, in contrast to the GRF methods. The sPBJ also has equal or superior power to the PBJ and permutation procedures. We provide an R package https://github.com/simonvandekar/pbj to perform inference using the PBJ and sPBJ procedures
As the most important tool to provide high-level evidence-based medicine, researchers can statistically summarize and combine data from multiple studies by conducting meta-analysis. In meta-analysis, mean differences are frequently used effect size measurements to deal with continuous data, such as the Cohens d statistic and Hedges g statistic values. To calculate the mean difference based effect sizes, the sample mean and standard deviation are two essential summary measures. However, many of the clinical reports tend not to directly record the sample mean and standard deviation. Instead, the sample size, median, minimum and maximum values and/or the first and third quartiles are reported. As a result, researchers have to transform the reported information to the sample mean and standard deviation for further compute the effect size. Since most of the popular transformation methods were developed upon the normality assumption of the underlying data, it is necessary to perform a pre-test before transforming the summary statistics. In this article, we had introduced test statistics for three popular scenarios in meta-analysis. We suggests medical researchers to perform a normality test of the selected studies before using them to conduct further analysis. Moreover, we applied three different case studies to demonstrate the usage of the newly proposed test statistics. The real data case studies indicate that the new test statistics are easy to apply in practice and by following the recommended path to conduct the meta-analysis, researchers can obtain more reliable conclusions.
Bayesian likelihood-free methods implement Bayesian inference using simulation of data from the model to substitute for intractable likelihood evaluations. Most likelihood-free inference methods replace the full data set with a summary statistic before performing Bayesian inference, and the choice of this statistic is often difficult. The summary statistic should be low-dimensional for computational reasons, while retaining as much information as possible about the parameter. Using a recent idea from the interpretable machine learning literature, we develop some regression-based diagnostic methods which are useful for detecting when different parts of a summary statistic vector contain conflicting information about the model parameters. Conflicts of this kind complicate summary statistic choice, and detecting them can be insightful about model deficiencies and guide model improvement. The diagnostic methods developed are based on regression approaches to likelihood-free inference, in which the regression model estimates the posterior density using summary statistics as features. Deletion and imputation of part of the summary statistic vector within the regression model can remove conflicts and approximate posterior distributions for summary statistic subsets. A larger than expected change in the estimated posterior density following deletion and imputation can indicate a conflict in which inferences of interest are affected. The usefulness of the new methods is demonstrated in a number of real examples.
We introduce the notion of intensity reweighted moment pseudostationary point processes on linear networks. Based on arbitrary general regular linear network distances, we propose geometrically correct
Predicting risks of chronic diseases has become increasingly important in clinical practice. When a prediction model is developed in a given source cohort, there is often a great interest to apply the model to other cohorts. However, due to potential discrepancy in baseline disease incidences between different cohorts and shifts in patient composition, the risk predicted by the original model often under- or over-estimates the risk in the new cohort. The remedy of such a poorly calibrated prediction is needed for proper medical decision-making. In this article, we assume the relative risks of predictors are the same between the two cohorts, and propose a novel weighted estimating equation approach to re-calibrating the projected risk for the targeted population through updating the baseline risk. The recalibration leverages the knowledge about the overall survival probabilities for the disease of interest and competing events, and the summary information of risk factors from the targeted population. The proposed re-calibrated risk estimators gain efficiency if the risk factor distributions are the same for both the source and target cohorts, and are robust with little bias if they differ. We establish the consistency and asymptotic normality of the proposed estimators. Extensive simulation studies demonstrate that the proposed estimators perform very well in terms of robustness and efficiency in finite samples. A real data application to colorectal cancer risk prediction also illustrates that the proposed method can be used in practice for model recalibration.