ترغب بنشر مسار تعليمي؟ اضغط هنا

Causal Feature Selection via Orthogonal Search

102   0   0.0 ( 0 )
 نشر من قبل Anant Raj
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. Recent work in the field of causal discovery exploits invariance properties of models across different experimental conditions for detecting direct causal links. However, these approaches generally do not scale well with the number of explanatory variables, are difficult to extend to nonlinear relationships, and require data across different experiments. Inspired by {em Debiased} machine learning methods, we study a one-vs.-the-rest feature selection approach to discover the direct causal parent of the response. We propose an algorithm that works for purely observational data, while also offering theoretical guarantees, including the case of partially nonlinear relationships. Requiring only one estimation for each variable, we can apply our approach even to large graphs, demonstrating significant improvements compared to established approaches.



قيم البحث

اقرأ أيضاً

The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. Recent work exploits stability of regression coefficients or invariance properti es of models across different experimental conditions for reconstructing the full causal graph. These approaches generally do not scale well with the number of the explanatory variables and are difficult to extend to nonlinear relationships. Contrary to existing work, we propose an approach which even works for observational data alone, while still offering theoretical guarantees including the case of partially nonlinear relationships. Our algorithm requires only one estimation for each variable and in our experiments we apply our causal discovery algorithm even to large graphs, demonstrating significant improvements compared to well established approaches.
Online feature selection has been an active research area in recent years. We propose a novel diverse online feature selection method based on Determinantal Point Processes (DPP). Our model aims to provide diverse features which can be composed in ei ther a supervised or unsupervised framework. The framework aims to promote diversity based on the kernel produced on a feature level, through at most three stages: feature sampling, local criteria and global criteria for feature selection. In the feature sampling, we sample incoming stream of features using conditional DPP. The local criteria is used to assess and select streamed features (i.e. only when they arrive), we use unsupervised scale invariant methods to remove redundant features and optionally supervised methods to introduce label information to assess relevant features. Lastly, the global criteria uses regularization methods to select a global optimal subset of features. This three stage procedure continues until there are no more features arriving or some predefined stopping condition is met. We demonstrate based on experiments conducted on that this approach yields better compactness, is comparable and in some instances outperforms other state-of-the-art online feature selection methods.
Most existing studies on the double/debiased machine learning method concentrate on the causal parameter estimation recovering from the first-order orthogonal score function. In this paper, we will construct the $k^{mathrm{th}}$-order orthogonal scor e function for estimating the average treatment effect (ATE) and present an algorithm that enables us to obtain the debiased estimator recovered from the score function. Such a higher-order orthogonal estimator is more robust to the misspecification of the propensity score than the first-order one does. Besides, it has the merit of being applicable with many machine learning methodologies such as Lasso, Random Forests, Neural Nets, etc. We also undergo comprehensive experiments to test the power of the estimator we construct from the score function using both the simulated datasets and the real datasets.
We consider the stochastic contextual bandit problem under the high dimensional linear model. We focus on the case where the action space is finite and random, with each action associated with a randomly generated contextual covariate. This setting f inds essential applications such as personalized recommendation, online advertisement, and personalized medicine. However, it is very challenging as we need to balance exploration and exploitation. We propose doubly growing epochs and estimating the parameter using the best subset selection method, which is easy to implement in practice. This approach achieves $ tilde{mathcal{O}}(ssqrt{T})$ regret with high probability, which is nearly independent in the ``ambient regression model dimension $d$. We further attain a sharper $tilde{mathcal{O}}(sqrt{sT})$ regret by using the textsc{SupLinUCB} framework and match the minimax lower bound of low-dimensional linear stochastic bandit problems. Finally, we conduct extensive numerical experiments to demonstrate the applicability and robustness of our algorithms empirically.
Reliable treatment effect estimation from observational data depends on the availability of all confounding information. While much work has targeted treatment effect estimation from observational data, there is relatively little work in the setting of confounding variable missingness, where collecting more information on confounders is often costly or time-consuming. In this work, we frame this challenge as a problem of feature acquisition of confounding features for causal inference. Our goal is to prioritize acquiring values for a fixed and known subset of missing confounders in samples that lead to efficient average treatment effect estimation. We propose two acquisition strategies based on i) covariate balancing (CB), and ii) reducing statistical estimation error on observed factual outcome error (OE). We compare CB and OE on five common causal effect estimation methods, and demonstrate improved sample efficiency of OE over baseline methods under various settings. We also provide visualizations for further analysis on the difference between our proposed methods.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا