New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Covariance Estimation and its Application in Large-Scale Online Controlled Experiments

77 0 0.0 ( 0 )

Download Cite

Added by Yihan Bao

Publication date 2021

fields Mathematical Statistics

and research's language is English

Authors Tao Xiong - Yihan Bao - Penglei Zhao

Applications Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

During the last few decades, online controlled experiments (also known as A/B tests) have been adopted as a golden standard for measuring business improvements in industry. In our company, there are more than a billion users participating in thousands of experiments simultaneously, and with statistical inference and estimations conducted to thousands of online metrics in those experiments routinely, computational costs would become a large concern. In this paper we propose a novel algorithm for estimating the covariance of online metrics, which introduces more flexibility to the trade-off between computational costs and precision in covariance estimation. This covariance estimation method reduces computational cost of metric calculation in large-scale setting, which facilitates further application in both online controlled experiments and adaptive experiments scenarios like variance reduction, continuous monitoring, Bayesian optimization, etc., and it can be easily implemented in engineering practice.

rate research

Covariance Matrix Estimation under Total Positivity for Portfolio Selection

150 - Raj Agrawal , Uma Roy , Caroline Uhler 2019

Selecting the optimal Markowitz porfolio depends on estimating the covariance matrix of the returns of $N$ assets from $T$ periods of historical data. Problematically, $N$ is typically of the same order as $T$, which makes the sample covariance matrix estimator perform poorly, both empirically and theoretically. While various other general purpose covariance matrix estimators have been introduced in the financial economics and statistics literature for dealing with the high dimensionality of this problem, we here propose an estimator that exploits the fact that assets are typically positively dependent. This is achieved by imposing that the joint distribution of returns be multivariate totally positive of order 2 ($text{MTP}_2$). This constraint on the covariance matrix not only enforces positive dependence among the assets, but also regularizes the covariance matrix, leading to desirable statistical properties such as sparsity. Based on stock-market data spanning over thirty years, we show that estimating the covariance matrix under $text{MTP}_2$ outperforms previous state-of-the-art methods including shrinkage estimators and factor models.

Applications Methodology

Large-scale Uncertainty Estimation and Its Application in Revenue Forecast of SMEs

90 - Zebang Zhang , Kui Zhao , Kai Huang 2020

The economic and banking importance of the small and medium enterprise (SME) sector is well recognized in contemporary society. Business credit loans are very important for the operation of SMEs, and the revenue is a key indicator of credit limit management. Therefore, it is very beneficial to construct a reliable revenue forecasting model. If the uncertainty of an enterprises revenue forecasting can be estimated, a more proper credit limit can be granted. Natural gradient boosting approach, which estimates the uncertainty of prediction by a multi-parameter boosting algorithm based on the natural gradient. However, its original implementation is not easy to scale into big data scenarios, and computationally expensive compared to state-of-the-art tree-based models (such as XGBoost). In this paper, we propose a Scalable Natural Gradient Boosting Machines that is simple to implement, readily parallelizable, interpretable and yields high-quality predictive uncertainty estimates. According to the characteristics of revenue distribution, we derive an uncertainty quantification function. We demonstrate that our method can distinguish between samples that are accurate and inaccurate on revenue forecasting of SMEs. Whats more, interpretability can be naturally obtained from the model, satisfying the financial needs.

Machine Learning Machine Learning

Data integration for high-resolution, continental-scale estimation of air pollution concentrations

124 - Matthew L. Thomas , Gavin Shaddick , Daniel Simpson 2019

Air pollution constitutes the highest environmental risk factor in relation to heath. In order to provide the evidence required for health impact analyses, to inform policy and to develop potential mitigation strategies comprehensive information is required on the state of air pollution. Information on air pollution traditionally comes from ground monitoring (GM) networks but these may not be able to provide sufficient coverage and may need to be supplemented with information from other sources (e.g. chemical transport models; CTMs). However, these may only be available on grids and may not capture micro-scale features that may be important in assessing air quality in areas of high population. We develop a model that allows calibration between multiple data sources available at different levels of support by allowing the coefficients of calibration equations to vary over space and time, enabling downscaling where the data is sufficient to support it. The model is used to produce high-resolution (1km $times$ 1km) estimates of NO$_2$ and PM$_{2.5}$ across Western Europe for 2010-2016. Concentrations of both pollutants are decreasing during this period, however there remain large populations exposed to levels exceeding the WHO Air Quality Guidelines and thus air pollution remains a serious threat to health.

Applications Methodology

CARPool Covariance: Fast, unbiased covariance estimation for large-scale structure observables

100 - Nicolas Chartier , Benjamin D. Wandelt 2021

The covariance matrix $boldsymbol{Sigma}$ of non-linear clustering statistics that are measured in current and upcoming surveys is of fundamental interest for comparing cosmological theory and data and a crucial ingredient for the likelihood approximations underlying widely used parameter inference and forecasting methods. The extreme number of simulations needed to estimate $boldsymbol{Sigma}$ to sufficient accuracy poses a severe challenge. Approximating $boldsymbol{Sigma}$ using inexpensive but biased surrogates introduces model error with respect to full simulations, especially in the non-linear regime of structure growth. To address this problem we develop a matrix generalization of Convergence Acceleration by Regression and Pooling (CARPool) to combine a small number of simulations with fast surrogates and obtain low-noise estimates of $boldsymbol{Sigma}$ that are unbiased by construction. Our numerical examples use CARPool to combine GADGET-III $N$-body simulations with fast surrogates computed using COmoving Lagrangian Acceleration (COLA). Even at the challenging redshift $z=0.5$, we find variance reductions of at least $mathcal{O}(10^1)$ and up to $mathcal{O}(10^4)$ for the elements of the matter power spectrum covariance matrix on scales $8.9times 10^{-3}<k_mathrm{max} <1.0$ $h {rm Mpc^{-1}}$. We demonstrate comparable performance for the covariance of the matter bispectrum, the matter correlation function and probability density function of the matter density field. We compare eigenvalues, likelihoods, and Fisher matrices computed using the CARPool covariance estimate with the standard sample covariance estimators and generally find considerable improvement except in cases where $Sigma$ is severely ill-conditioned.

Cosmology and Nongalactic Astrophysics Instrumentation and Methods for Astrophysics

Democratizing online controlled experiments at Booking.com

103 - Raphael Lopez Kaufman , Jegar Pitchforth , Lukas Vermeer 2017

There is an extensive literature about online controlled experiments, both on the statistical methods available to analyze experiment results as well as on the infrastructure built by several large scale Internet companies but also on the organizational challenges of embracing online experiments to inform product development. At Booking.com we have been conducting evidenced based product development using online experiments for more than ten years. Our methods and infrastructure were designed from their inception to reflect Booking.com culture, that is, with democratization and decentralization of experimentation and decision making in mind. In this paper we explain how building a central repository of successes and failures to allow for knowledge sharing, having a generic and extensible code library which enforces a loose coupling between experimentation and business logic, monitoring closely and transparently the quality and the reliability of the data gathering pipelines to build trust in the experimentation infrastructure, and putting in place safeguards to enable anyone to have end to end ownership of their experiments have allowed such a large organization as Booking.com to truly and successfully democratize experimentation.

Human-Computer Interaction

comments

Fetching comments

Arab Academy for Science and Technology and Maritime Transport

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Covariance Estimation and its Application in Large-Scale Online Controlled Experiments

Ask ChatGPT about the research

No Arabic abstract

Read More