No Arabic abstract
Despite an increasing reliance on fully-automated algorithmic decision-making in our day-to-day lives, human beings still make highly consequential decisions. As frequently seen in business, healthcare, and public policy, recommendations produced by algorithms are provided to human decision-makers to guide their decisions. While there exists a fast-growing literature evaluating the bias and fairness of such algorithmic recommendations, an overlooked question is whether they help humans make better decisions. We develop a statistical methodology for experimentally evaluating the causal impacts of algorithmic recommendations on human decisions. We also show how to examine whether algorithmic recommendations improve the fairness of human decisions and derive the optimal decision rules under various settings. We apply the proposed methodology to preliminary data from the first-ever randomized controlled trial that evaluates the pretrial Public Safety Assessment (PSA) in the criminal justice system. A goal of the PSA is to help judges decide which arrested individuals should be released. On the basis of the preliminary data available, we find that providing the PSA to the judge has little overall impact on the judges decisions and subsequent arrestee behavior. However, our analysis yields some potentially suggestive evidence that the PSA may help avoid unnecessarily harsh decisions for female arrestees regardless of their risk levels while it encourages the judge to make stricter decisions for male arrestees who are deemed to be risky. In terms of fairness, the PSA appears to increase the gender bias against males while having little effect on any existing racial differences in judges decision. Finally, we find that the PSAs recommendations might be unnecessarily severe unless the cost of a new crime is sufficiently high.
This article surveys the use of algorithmic systems to support decision-making in the public sector. Governments adopt, procure, and use algorithmic systems to support their functions within several contexts -- including criminal justice, education, and benefits provision -- with important consequences for accountability, privacy, social inequity, and public participation in decision-making. We explore the social implications of municipal algorithmic systems across a variety of stages, including problem formulation, technology acquisition, deployment, and evaluation. We highlight several open questions that require further empirical research.
How to attribute responsibility for autonomous artificial intelligence (AI) systems actions has been widely debated across the humanities and social science disciplines. This work presents two experiments ($N$=200 each) that measure peoples perceptions of eight different notions of moral responsibility concerning AI and human agents in the context of bail decision-making. Using real-life adapted vignettes, our experiments show that AI agents are held causally responsible and blamed similarly to human agents for an identical task. However, there was a meaningful difference in how people perceived these agents moral responsibility; human agents were ascribed to a higher degree of present-looking and forward-looking notions of responsibility than AI agents. We also found that people expect both AI and human decision-makers and advisors to justify their decisions regardless of their nature. We discuss policy and HCI implications of these findings, such as the need for explainable AI in high-stakes scenarios.
Using the concept of principal stratification from the causal inference literature, we introduce a new notion of fairness, called principal fairness, for human and algorithmic decision-making. The key idea is that one should not discriminate among individuals who would be similarly affected by the decision. Unlike the existing statistical definitions of fairness, principal fairness explicitly accounts for the fact that individuals can be impacted by the decision. We propose an axiomatic assumption that all groups are created equal. This assumption is motivated by a belief that protected attributes such as race and gender should have no direct causal effects on potential outcomes. Under this assumption, we show that principal fairness implies all three existing statistical fairness criteria once we account for relevant covariates. This result also highlights the essential role of conditioning covariates in resolving the previously recognized tradeoffs between the existing statistical fairness criteria. Finally, we discuss how to empirically choose conditioning covariates and then evaluate the principal fairness of a particular decision.
Individual neighborhoods within large cities can benefit from independent analysis of public data in the context of ongoing efforts to improve the community. Yet existing tools for public data analysis and visualization are often mismatched to community needs, for reasons including geographic granularity that does not correspond to community boundaries, siloed data sets, inaccurate assumptions about data literacy, and limited user input in design and implementation phases. In Atlanta this need is being addressed through a Data Dashboard developed under the auspices of the Westside Communities Alliance (WCA), a partnership between Georgia Tech and community stakeholders. In this paper we present an interactive analytic and visualization tool for public safety data within the WCA Data Dashboard. We describe a human-centered approach to understand the needs of users and to build accessible mapping tools for visualization and analysis. The tools include a variety of overlays that allow users to spatially correlate features of the built environment, such as vacant properties with criminal activity as well as crime prevention efforts. We are in the final stages of developing the first version of the tool, with plans for a public release in fall of 2016.
Existing neural network-based autonomous systems are shown to be vulnerable against adversarial attacks, therefore sophisticated evaluation on their robustness is of great importance. However, evaluating the robustness only under the worst-case scenarios based on known attacks is not comprehensive, not to mention that some of them even rarely occur in the real world. In addition, the distribution of safety-critical data is usually multimodal, while most traditional attacks and evaluation methods focus on a single modality. To solve the above challenges, we propose a flow-based multimodal safety-critical scenario generator for evaluating decisionmaking algorithms. The proposed generative model is optimized with weighted likelihood maximization and a gradient-based sampling procedure is integrated to improve the sampling efficiency. The safety-critical scenarios are generated by querying the task algorithms and the log-likelihood of the generated scenarios is in proportion to the risk level. Experiments on a self-driving task demonstrate our advantages in terms of testing efficiency and multimodal modeling capability. We evaluate six Reinforcement Learning algorithms with our generated traffic scenarios and provide empirical conclusions about their robustness.