No Arabic abstract
We consider problems in which a system receives external emph{perturbations} from time to time. For instance, the system can be a train network in which particular lines are repeatedly disrupted without warning, having an effect on passenger behavior. The goal is to predict changes in the behavior of the system at particular points of interest, such as passenger traffic around stations at the affected rails. We assume that the data available provides records of the system functioning at its natural regime (e.g., the train network without disruptions) and data on cases where perturbations took place. The inference problem is how information concerning perturbations, with particular covariates such as location and time, can be generalized to predict the effect of novel perturbations. We approach this problem from the point of view of a mapping from the counterfactual distribution of the system behavior without disruptions to the distribution of the disrupted system. A variant on emph{distribution regression} is developed for this setup.
We present a Bayesian inference approach to estimating the cumulative mass profile and mean squared velocity profile of a globular cluster given the spatial and kinematic information of its stars. Mock globular clusters with a range of sizes and concentrations are generated from lowered isothermal dynamical models, from which we test the reliability of the Bayesian method to estimate model parameters through repeated statistical simulation. We find that given unbiased star samples, we are able to reconstruct the cluster parameters used to generate the mock cluster and the clusters cumulative mass and mean velocity squared profiles with good accuracy. We further explore how strongly biased sampling, which could be the result of observing constraints, may affect this approach. Our tests indicate that if we instead have biased samples, then our estimates can be off in certain ways that are dependent on cluster morphology. Overall, our findings motivate obtaining samples of stars that are as unbiased as possible. This may be achieved by combining information from multiple telescopes (e.g., Hubble and Gaia), but will require careful modeling of the measurement uncertainties through a hierarchical model, which we plan to pursue in future work.
We present a new method for counterfactual explanations (CFEs) based on Bayesian optimisation that applies to both classification and regression models. Our method is a globally convergent search algorithm with support for arbitrary regression models and constraints like feature sparsity and actionable recourse, and furthermore can answer multiple counterfactual questions in parallel while learning from previous queries. We formulate CFE search for regression models in a rigorous mathematical framework using differentiable potentials, which resolves robustness issues in threshold-based objectives. We prove that in this framework, (a) verifying the existence of counterfactuals is NP-complete; and (b) that finding instances using such potentials is CLS-complete. We describe a unified algorithm for CFEs using a specialised acquisition function that composes both expected improvement and an exponential-polynomial (EP) family with desirable properties. Our evaluation on real-world benchmark domains demonstrate high sample-efficiency and precision.
This paper is concerned with making Bayesian inference from data that are assumed to be drawn from a Bingham distribution. A barrier to the Bayesian approach is the parameter-dependent normalising constant of the Bingham distribution, which, even when it can be evaluated or accurately approximated, would have to be calculated at each iteration of an MCMC scheme, thereby greatly increasing the computational burden. We propose a method which enables exact (in Monte Carlo sense) Bayesian inference for the unknown parameters of the Bingham distribution by completely avoiding the need to evaluate this constant. We apply the method to simulated and real data, and illustrate that it is simpler to implement, faster, and performs better than an alternative algorithm that has recently been proposed in the literature.
This paper introduces a unified framework of counterfactual estimation for time-series cross-sectional data, which estimates the average treatment effect on the treated by directly imputing treated counterfactuals. Examples include the fixed effects counterfactual estimator, interactive fixed effects counterfactual estimator, and matrix completion estimator. These estimators provide more reliable causal estimates than conventional twoway fixed effects models when treatment effects are heterogeneous or unobserved time-varying confounders exist. Under this framework, we propose a new dynamic treatment effects plot, as well as several diagnostic tests, to help researchers gauge the validity of the identifying assumptions. We illustrate these methods with two political economy examples and develop an open-source package, fect, in both R and Stata to facilitate implementation.
Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems. It requires us to represent the system in form of a structural causal model, complete with a causal diagram, probabilistic assumptions on exogenous variables, and functional assignments. Specifying such models can be extremely difficult in practice. The process requires substantial domain expertise, and does not scale easily to large systems, multiple systems, or novel system modifications. At the same time, many application domains, such as molecular biology, are rich in structured causal knowledge that is qualitative in nature. This manuscript proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question. We demonstrate the feasibility, accuracy and versatility of this approach using two case studies in systems biology. The first demonstrates the appropriateness of the underlying assumptions and the accuracy of the results. The second demonstrates the versatility of the approach by querying a knowledge base for the molecular determinants of a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-induced cytokine storm, and performing counterfactual inference to estimate the causal effect of medical countermeasures for severely ill patients.