ﻻ يوجد ملخص باللغة العربية
Heavy-tailed metrics are common and often critical to product evaluation in the online world. While we may have samples large enough for Central Limit Theorem to kick in, experimentation is challenging due to the wide confidence interval of estimation. We demonstrate the pressure by running A/A simulations with customer spending data from a large-scale Ecommerce site. Solutions are then explored. On one front we address the heavy tail directly and highlight the often ignored nuances of winsorization. In particular, the legitimacy of false positive rate could be at risk. We are further inspired by the idea of robust statistics and introduce Huber regression as a better way to measure treatment effect. On another front covariates from pre-experiment period are exploited. Although they are independent to assignment and potentially explain the variation of response well, concerns are that models are learned against prediction error rather than the bias of parameter. We find the framework of orthogonal learning useful, matching not raw observations but residuals from two predictions, one towards the response and the other towards the assignment. Robust regression is readily integrated, together with cross-fitting. The final design is proven highly effective in driving down variance at the same time controlling bias. It is empowering our daily practice and hopefully can also benefit other applications in the industry.
The autoregressive (AR) model is a widely used model to understand time series data. Traditionally, the innovation noise of the AR is modeled as Gaussian. However, many time series applications, for example, financial time series data, are non-Gaussi
It is important to estimate the local average treatment effect (LATE) when compliance with a treatment assignment is incomplete. The previously proposed methods for LATE estimation required all relevant variables to be jointly observed in a single da
The research described herewith is to re-visit the classical doubly robust estimation of average treatment effect by conducting a systematic study on the comparisons, in the sense of asymptotic efficiency, among all possible combinations of the estim
The intercity freight trips of heavy trucks are important data for transportation system planning and urban agglomeration management. In recent decades, the extraction of freight trips from GPS data has gradually become the main alternative to tradit
The field of precision medicine aims to tailor treatment based on patient-specific factors in a reproducible way. To this end, estimating an optimal individualized treatment regime (ITR) that recommends treatment decisions based on patient characteri