No Arabic abstract
Data collected about individuals is regularly used to make decisions that impact those same individuals. We consider settings where sensitive personal data is used to decide who will receive resources or benefits. While it is well known that there is a tradeoff between protecting privacy and the accuracy of decisions, we initiate a first-of-its-kind study into the impact of formally private mechanisms (based on differential privacy) on fair and equitable decision-making. We empirically investigate novel tradeoffs on two real-world decisions made using U.S. Census data (allocation of federal funds and assignment of voting rights benefits) as well as a classic apportionment problem. Our results show that if decisions are made using an $epsilon$-differentially private version of the data, under strict privacy constraints (smaller $epsilon$), the noise added to achieve privacy may disproportionately impact some groups over others. We propose novel measures of fairness in the context of randomized differentially private algorithms and identify a range of causes of outcome disparities.
People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: giving adequate success rates to specifically protected groups. In contrast, the alternative paradigm of individual fairness has received relatively little attention, and this paper advances this less explored direction. The paper introduces a method for probabilistically mapping user records into a low-rank representation that reconciles individual fairness and the utility of classifiers and rankings in downstream applications. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on a variety of real-world datasets. Our experiments show substantial improvements over the best prior work for this setting.
The fundamental goal of business data analysis is to improve business decisions using data. Business users such as sales, marketing, product, or operations managers often make decisions to achieve key performance indicator (KPI) goals such as increasing customer retention, decreasing cost, and increasing sales. To discover the relationship between data attributes hypothesized to be drivers and those corresponding to KPIs of interest, business users currently need to perform lengthy exploratory analyses, considering multitudes of combinations and scenarios, slicing, dicing, and transforming the data accordingly. For example, analyzing customer retention across quarters of the year or suggesting optimal media channels across strata of customers. However, the increasing complexity of datasets combined with the cognitive limitations of humans makes it challenging to carry over multiple hypotheses, even for simple datasets. Therefore mentally performing such analyses is hard. Existing commercial tools either provide partial solutions whose effectiveness remains unclear or fail to cater to business users. Here we argue for four functionalities that we believe are necessary to enable business users to interactively learn and reason about the relationships (functions) between sets of data attributes, facilitating data-driven decision making. We implement these functionalities in SystemD, an interactive visual analysis system enabling business users to experiment with the data by asking what-if questions. We evaluate the system through three business use cases: marketing mix modeling analysis, customer retention analysis, and deal closing analysis, and report on feedback from multiple business users. Overall, business users find SystemD intuitive and useful for quick testing and validation of their hypotheses around interested KPI as well as in making effective and fast data-driven decisions.
Situations involving competition for resources among entities can be modeled by the competitive multi-armed bandit (CMAB) problem, which relates to social issues such as maximizing the total outcome and achieving the fairest resource repartition among individuals. In these respects, the intrinsic randomness and global properties of quantum states provide ideal tools for obtaining optimal solutions to this problem. Based on the previous study of the CMAB problem in the two-arm, two-player case, this paper presents the theoretical principles necessary to find polarization-entangled N-photon states that can optimize the total resource output while ensuring equality among players. These principles were applied to two-, three-, four-, and five-player cases by using numerical simulations to reproduce realistic configurations and find the best strategies to overcome potential misalignment between the polarization measurement systems of the players. Although a general formula for the N-player case is not presented here, general derivation rules and a verification algorithm are proposed. This report demonstrates the potential usability of quantum states in collective decision making with limited, probabilistic resources, which could serve as a first step toward quantum-based resource allocation systems.
In modern information systems different information features, about the same individual, are often collected and managed by autonomous data collection services that may have different privacy policies. Answering many end-users legitimate queries requires the integration of data from multiple such services. However, data integration is often hindered by the lack of a trusted entity, often called a mediator, with which the services can share their data and delegate the enforcement of their privacy policies. In this paper, we propose a flexible privacy-preserving data integration approach for answering data integration queries without the need for a trusted mediator. In our approach, services are allowed to enforce their privacy policies locally. The mediator is considered to be untrusted, and only has access to encrypted information to allow it to link data subjects across the different services. Services, by virtue of a new privacy requirement, dubbed k-Protection, limiting privacy leaks, cannot infer information about the data held by each other. End-users, in turn, have access to privacy-sanitized data only. We evaluated our approach using an example and a real dataset from the healthcare application domain. The results are promising from both the privacy preservation and the performance perspectives.
Big Data is used by data miner for analysis purpose which may contain sensitive information. During the procedures it raises certain privacy challenges for researchers. The existing privacy preserving methods use different algorithms that results into limitation of data reconstruction while securing the sensitive data. This paper presents a clustering based privacy preservation probabilistic model of big data to secure sensitive information..model to attain minimum perturbation and maximum privacy. In our model, sensitive information is secured after identifying the sensitive data from data clusters to modify or generalize it.The resulting dataset is analysed to calculate the accuracy level of our model in terms of hidden data, lossed data as result of reconstruction. Extensive experiements are carried out in order to demonstrate the results of our proposed model. Clustering based Privacy preservation of individual data in big data with minimum perturbation and successful reconstruction highlights the significance of our model in addition to the use of standard performance evaluation measures.