ترغب بنشر مسار تعليمي؟ اضغط هنا

Covariance Estimation and its Application in Large-Scale Online Controlled Experiments

77   0   0.0 ( 0 )
 نشر من قبل Yihan Bao
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

During the last few decades, online controlled experiments (also known as A/B tests) have been adopted as a golden standard for measuring business improvements in industry. In our company, there are more than a billion users participating in thousands of experiments simultaneously, and with statistical inference and estimations conducted to thousands of online metrics in those experiments routinely, computational costs would become a large concern. In this paper we propose a novel algorithm for estimating the covariance of online metrics, which introduces more flexibility to the trade-off between computational costs and precision in covariance estimation. This covariance estimation method reduces computational cost of metric calculation in large-scale setting, which facilitates further application in both online controlled experiments and adaptive experiments scenarios like variance reduction, continuous monitoring, Bayesian optimization, etc., and it can be easily implemented in engineering practice.



قيم البحث

اقرأ أيضاً

Selecting the optimal Markowitz porfolio depends on estimating the covariance matrix of the returns of $N$ assets from $T$ periods of historical data. Problematically, $N$ is typically of the same order as $T$, which makes the sample covariance matri x estimator perform poorly, both empirically and theoretically. While various other general purpose covariance matrix estimators have been introduced in the financial economics and statistics literature for dealing with the high dimensionality of this problem, we here propose an estimator that exploits the fact that assets are typically positively dependent. This is achieved by imposing that the joint distribution of returns be multivariate totally positive of order 2 ($text{MTP}_2$). This constraint on the covariance matrix not only enforces positive dependence among the assets, but also regularizes the covariance matrix, leading to desirable statistical properties such as sparsity. Based on stock-market data spanning over thirty years, we show that estimating the covariance matrix under $text{MTP}_2$ outperforms previous state-of-the-art methods including shrinkage estimators and factor models.
The economic and banking importance of the small and medium enterprise (SME) sector is well recognized in contemporary society. Business credit loans are very important for the operation of SMEs, and the revenue is a key indicator of credit limit man agement. Therefore, it is very beneficial to construct a reliable revenue forecasting model. If the uncertainty of an enterprises revenue forecasting can be estimated, a more proper credit limit can be granted. Natural gradient boosting approach, which estimates the uncertainty of prediction by a multi-parameter boosting algorithm based on the natural gradient. However, its original implementation is not easy to scale into big data scenarios, and computationally expensive compared to state-of-the-art tree-based models (such as XGBoost). In this paper, we propose a Scalable Natural Gradient Boosting Machines that is simple to implement, readily parallelizable, interpretable and yields high-quality predictive uncertainty estimates. According to the characteristics of revenue distribution, we derive an uncertainty quantification function. We demonstrate that our method can distinguish between samples that are accurate and inaccurate on revenue forecasting of SMEs. Whats more, interpretability can be naturally obtained from the model, satisfying the financial needs.
Air pollution constitutes the highest environmental risk factor in relation to heath. In order to provide the evidence required for health impact analyses, to inform policy and to develop potential mitigation strategies comprehensive information is r equired on the state of air pollution. Information on air pollution traditionally comes from ground monitoring (GM) networks but these may not be able to provide sufficient coverage and may need to be supplemented with information from other sources (e.g. chemical transport models; CTMs). However, these may only be available on grids and may not capture micro-scale features that may be important in assessing air quality in areas of high population. We develop a model that allows calibration between multiple data sources available at different levels of support by allowing the coefficients of calibration equations to vary over space and time, enabling downscaling where the data is sufficient to support it. The model is used to produce high-resolution (1km $times$ 1km) estimates of NO$_2$ and PM$_{2.5}$ across Western Europe for 2010-2016. Concentrations of both pollutants are decreasing during this period, however there remain large populations exposed to levels exceeding the WHO Air Quality Guidelines and thus air pollution remains a serious threat to health.
The covariance matrix $boldsymbol{Sigma}$ of non-linear clustering statistics that are measured in current and upcoming surveys is of fundamental interest for comparing cosmological theory and data and a crucial ingredient for the likelihood approxim ations underlying widely used parameter inference and forecasting methods. The extreme number of simulations needed to estimate $boldsymbol{Sigma}$ to sufficient accuracy poses a severe challenge. Approximating $boldsymbol{Sigma}$ using inexpensive but biased surrogates introduces model error with respect to full simulations, especially in the non-linear regime of structure growth. To address this problem we develop a matrix generalization of Convergence Acceleration by Regression and Pooling (CARPool) to combine a small number of simulations with fast surrogates and obtain low-noise estimates of $boldsymbol{Sigma}$ that are unbiased by construction. Our numerical examples use CARPool to combine GADGET-III $N$-body simulations with fast surrogates computed using COmoving Lagrangian Acceleration (COLA). Even at the challenging redshift $z=0.5$, we find variance reductions of at least $mathcal{O}(10^1)$ and up to $mathcal{O}(10^4)$ for the elements of the matter power spectrum covariance matrix on scales $8.9times 10^{-3}<k_mathrm{max} <1.0$ $h {rm Mpc^{-1}}$. We demonstrate comparable performance for the covariance of the matter bispectrum, the matter correlation function and probability density function of the matter density field. We compare eigenvalues, likelihoods, and Fisher matrices computed using the CARPool covariance estimate with the standard sample covariance estimators and generally find considerable improvement except in cases where $Sigma$ is severely ill-conditioned.
There is an extensive literature about online controlled experiments, both on the statistical methods available to analyze experiment results as well as on the infrastructure built by several large scale Internet companies but also on the organizatio nal challenges of embracing online experiments to inform product development. At Booking.com we have been conducting evidenced based product development using online experiments for more than ten years. Our methods and infrastructure were designed from their inception to reflect Booking.com culture, that is, with democratization and decentralization of experimentation and decision making in mind. In this paper we explain how building a central repository of successes and failures to allow for knowledge sharing, having a generic and extensible code library which enforces a loose coupling between experimentation and business logic, monitoring closely and transparently the quality and the reliability of the data gathering pipelines to build trust in the experimentation infrastructure, and putting in place safeguards to enable anyone to have end to end ownership of their experiments have allowed such a large organization as Booking.com to truly and successfully democratize experimentation.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا