Adjusting for Spatial Effects in Genomic Prediction

62 0 0.0 ( 0 )

Download Cite

Added by Somak Dutta

Publication date 2019

fields Mathematical Statistics

and research's language is English

Authors Xiaojun Mao - Somak Dutta - Raymond K. W. Wong

Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper investigates the problem of adjusting for spatial effects in genomic prediction. Despite being seldomly considered in genomic prediction, spatial effects often affect phenotypic measurements of plants. We consider a Gaussian random field model with an additive covariance structure that incorporates genotype effects, spatial effects and subpopulation effects. An empirical study shows the existence of spatial effects and heterogeneity across different subpopulation families, while simulations illustrate the improvement in selecting genotypically superior plants by adjusting for spatial effects in genomic prediction.

rate research

Adjusting for Scorekeeper Bias in NBA Box Scores

105 - Matthew van Bommel , Luke Bornn 2016

Box score statistics in the National Basketball Association are used to measure and evaluate player performance. Some of these statistics are subjective in nature and since box score statistics are recorded by scorekeepers hired by the home team for each game, there exists potential for inconsistency and bias. These inconsistencies can have far reaching consequences, particularly with the rise in popularity of daily fantasy sports. Using box score data, we estimate models able to quantify both the bias and the generosity of each scorekeeper for two of the most subjective statistics: assists and blocks. We then use optical player tracking data for the 2014-2015 season to improve the assist model by including other contextual spatio-temporal variables such as time of possession, player locations, and distance traveled. From this model, we present results measuring the impact of the scorekeeper and of the other contextual variables on the probability of a pass being recorded as an assist. Results for adjusting season assist totals to remove scorekeeper influence are also presented.

Applications

PSTN: Periodic Spatial-temporal Deep Neural Network for Traffic Condition Prediction

132 - Tiange Wang , Zijun Zhang , 2021

Accurate forecasting of traffic conditions is critical for improving safety, stability, and efficiency of a city transportation system. In reality, it is challenging to produce accurate traffic forecasts due to the complex and dynamic spatiotemporal correlations. Most existing works only consider partial characteristics and features of traffic data, and result in unsatisfactory performances on modeling and forecasting. In this paper, we propose a periodic spatial-temporal deep neural network (PSTN) with three pivotal modules to improve the forecasting performance of traffic conditions through a novel integration of three types of information. First, the historical traffic information is folded and fed into a module consisting of a graph convolutional network and a temporal convolutional network. Second, the recent traffic information together with the historical output passes through the second module consisting of a graph convolutional network and a gated recurrent unit framework. Finally, a multi-layer perceptron is applied to process the auxiliary road attributes and output the final predictions. Experimental results on two publicly accessible real-world urban traffic data sets show that the proposed PSTN outperforms the state-of-the-art benchmarks by significant margins for short-term traffic conditions forecasting

Applications Machine Learning Machine Learning

Multi-resolution Spatial Regression for Aggregated Data with an Application to Crop Yield Prediction

95 - Harrison Zhu , Adam Howes , Owen van Eer 2021

We develop a new methodology for spatial regression of aggregated outputs on multi-resolution covariates. Such problems often occur with spatial data, for example in crop yield prediction, where the output is spatially-aggregated over an area and the covariates may be observed at multiple resolutions. Building upon previous work on aggregated output regression, we propose a regression framework to synthesise the effects of the covariates at different resolutions on the output and provide uncertainty estimation. We show that, for a crop yield prediction problem, our approach is more scalable, via variational inference, than existing multi-resolution regression models. We also show that our framework yields good predictive performance, compared to existing multi-resolution crop yield models, whilst being able to provide estimation of the underlying spatial effects.

Applications Methodology

Risk prediction for prostate cancer recurrence through regularized estimation with simultaneous adjustment for nonlinear clinical effects

366 - Qi Long , Matthias Chung , Carlos S. Moreno 2011

In biomedical studies it is of substantial interest to develop risk prediction scores using high-dimensional data such as gene expression data for clinical endpoints that are subject to censoring. In the presence of well-established clinical risk factors, investigators often prefer a procedure that also adjusts for these clinical variables. While accelerated failure time (AFT) models are a useful tool for the analysis of censored outcome data, it assumes that covariate effects on the logarithm of time-to-event are linear, which is often unrealistic in practice. We propose to build risk prediction scores through regularized rank estimation in partly linear AFT models, where high-dimensional data such as gene expression data are modeled linearly and important clinical variables are modeled nonlinearly using penalized regression splines. We show through simulation studies that our model has better operating characteristics compared to several existing models. In particular, we show that there is a nonnegligible effect on prediction as well as feature selection when nonlinear clinical effects are misspecified as linear. This work is motivated by a recent prostate cancer study, where investigators collected gene expression data along with established prognostic clinical variables and the primary endpoint is time to prostate cancer recurrence.

Applications

Balancing spatial and non-spatial variation in varying coefficient modeling: a remedy for spurious correlation

63 - Daisuke Murakami , Daniel A. Griffith 2020

This study discusses the importance of balancing spatial and non-spatial variation in spatial regression modeling. Unlike spatially varying coefficients (SVC) modeling, which is popular in spatial statistics, non-spatially varying coefficients (NVC) modeling has largely been unexplored in spatial fields. Nevertheless, as we will explain, consideration of non-spatial variation is needed not only to improve model accuracy but also to reduce spurious correlation among varying coefficients, which is a major problem in SVC modeling. We consider a Moran eigenvector approach modeling spatially and non-spatially varying coefficients (S&NVC). A Monte Carlo simulation experiment comparing our S&NVC model with existing SVC models suggests both modeling accuracy and computational efficiency for our approach. Beyond that, somewhat surprisingly, our approach identifies true and spurious correlations among coefficients nearly perfectly, even when usual SVC models suffer from severe spurious correlations. It implies that S&NVC model should be used even when the analysis purpose is modeling SVCs. Finally, our S&NVC model is employed to analyze a residential land price dataset. Its results suggest existence of both spatial and non-spatial variation in regression coefficients in practice. The S&NVC model is now implemented in the R package spmoran.

Applications