No Arabic abstract
Genomic imprinting has been thought to play an important role in seed development in flowering plants. Seed in a flowering plant normally contains diploid embryo and triploid endosperm. Empirical studies have shown that some economically important endosperm traits are genetically controlled by imprinted genes. However, the exact number and location of the imprinted genes are largely unknown due to the lack of efficient statistical mapping methods. Here we propose a general statistical variance components framework by utilizing the natural information of sex-specific allelic sharing among sibpairs in line crosses, to map imprinted quantitative trait loci (iQTL) underlying endosperm traits. We propose a new variance components partition method considering the unique characteristic of the triploid endosperm genome, and develop a restricted maximum likelihood estimation method in an interval scan for estimating and testing genome-wide iQTL effects. Cytoplasmic maternal effect which is thought to have primary influences on yield and grain quality is also considered when testing for genomic imprinting. Extension to multiple iQTL analysis is proposed. Asymptotic distribution of the likelihood ratio test for testing the variance components under irregular conditions are studied. Both simulation study and real data analysis indicate good performance and powerfulness of the developed approach.
Understanding the relationship between genomic variation and variation in phenotypes for quantitative traits such as physiology, yield, fitness or behavior, will provide important insights for both predicting adaptive evolution and for breeding schemes. A particular question is whether the genetic variation that influences quantitative phenotypes is typically the result of one or two mutations of large effect, or multiple mutations of small effect. In this paper we explore this issue using the wild model legume Medicago truncatula. We show that phenotypes, such as quantitative disease resistance, can be well-predicted using genome-wide patterns of admixture, from which it follows that there must be many mutations of small effect. Our findings prove the potential of our novel whole-genome modeling -WhoGEM- method and experimentally validate, for the first time, the infinitesimal model as a mechanism for adaptation of quantitative phenotypes in plants. This insight can accelerate breeding and biomedicine research programs.
Tropical cyclones (TCs) rank among the most costly natural disasters in the United States, and accurate forecasts of track and intensity are critical for emergency response. Intensity guidance has improved steadily but slowly, as processes which drive intensity change are not fully understood. Because most TCs develop far from land-based observing networks, geostationary satellite imagery is critical to monitor these storms. However, these complex data can be challenging to analyze in real time, and off-the-shelf machine learning algorithms have limited applicability on this front due to their ``black box structure. This study presents analytic tools that quantify convective structure patterns in infrared satellite imagery for over-ocean TCs, yielding lower-dimensional but rich representations that support analysis and visualization of how these patterns evolve during rapid intensity change. The proposed ORB feature suite targets the global Organization, Radial structure, and Bulk morphology of TCs. By combining ORB and empirical orthogonal functions, we arrive at an interpretable and rich representation of convective structure patterns that serve as inputs to machine learning methods. This study uses the logistic lasso, a penalized generalized linear model, to relate predictors to rapid intensity change. Using ORB alone, binary classifiers identifying the presence (versus absence) of such intensity change events can achieve accuracy comparable to classifiers using environmental predictors alone, with a combined predictor set improving classification accuracy in some settings. More complex nonlinear machine learning methods did not perform better than the linear logistic lasso model for current data.
The endowment effect, coined by Nobel Laureate Richard Thaler, posits that people tend to inflate the value of items they own. This bias was studied, both theoretically and empirically, with respect to a single item. Babaioff et al. [EC18] took a first step at extending this study beyond a single item. They proposed a specific formulation of the endowment effect in combinatorial settings, and showed that equilibrium existence with respect to the endowed valuations extends from gross substitutes to submodular valuations, but provably fails to extend to XOS valuations. Extending the endowment effect to combinatorial settings can take different forms. In this work, we devise a framework that captures a space of endowment effects, upon which we impose a partial order, which preserves endowment equilibrium existence. Within this framework, we provide existence and welfare guarantees for endowment equilibria corresponding to various endowment effects. Our main results are the following: (1) For markets with XOS valuations, we introduce an endowment effect that is stronger than that of Babaioff et al., for which an endowment equilibrium is guaranteed to exist and gives at least half of the optimal welfare. Moreover, this equilibrium can be reached via a variant of the flexible ascent auction. (2) For markets with arbitrary valuations, we show that bundling leads to a sweeping positive result. In particular, if items can be prepacked into indivisible bundles, there always exists an endowment equilibrium with optimal welfare. Moreover, we provide a polynomial algorithm that given an arbitrary allocation $S$, computes an endowment equilibrium with the same welfare guarantee as in $S$.
High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevance. We propose a novel systems approach to complex trait prediction, which leverages and integrates similarity in genetic, transcriptomic or other omics-level data. We translate the omic similarity into phenotypic similarity using a method called Kriging, commonly used in geostatistics and machine learning. Our method called OmicKriging emphasizes the use of a wide variety of systems-level data, such as those increasingly made available by comprehensive surveys of the genome, transcriptome and epigenome, for complex trait prediction. Furthermore, our OmicKriging framework allows easy integration of prior information on the function of subsets of omics-level data from heterogeneous sources without the sometimes heavy computational burden of Bayesian approaches. Using seven disease datasets from the Wellcome Trust Case Control Consortium (WTCCC), we show that OmicKriging allows simple integration of sparse and highly polygenic components yielding comparable performance at a fraction of the computing time of a recently published Bayesian sparse linear mixed model method. Using a cellular growth phenotype, we show that integrating mRNA and microRNA expression data substantially increases performance over either dataset alone. We also integrate genotype and expression data to predict change in LDL cholesterol levels after statin treatment and show that OmicKriging performs better than the polygenic score method. We provide an R package to implement OmicKriging.
This paper introduces a modular processing chain to derive global high-resolution maps of leaf traits. In particular, we present global maps at 500 m resolution of specific leaf area, leaf dry matter content, leaf nitrogen and phosphorus content per dry mass, and leaf nitrogen/phosphorus ratio. The processing chain exploits machine learning techniques along with optical remote sensing data (MODIS/Landsat) and climate data for gap filling and up-scaling of in-situ measured leaf traits. The chain first uses random forests regression with surrogates to fill gaps in the database ($> 45 % $ of missing entries) and maximize the global representativeness of the trait dataset. Along with the estimated global maps of leaf traits, we provide associated uncertainty estimates derived from the regression models. The process chain is modular, and can easily accommodate new traits, data streams (traits databases and remote sensing data), and methods. The machine learning techniques applied allow attribution of information gain to data input and thus provide the opportunity to understand trait-environment relationships at the plant and ecosystem scales.