No Arabic abstract
The US Census Bureau plans to protect the privacy of 2020 Census respondents through its Disclosure Avoidance System (DAS), which attempts to achieve differential privacy guarantees by adding noise to the Census microdata. By applying redistricting simulation and analysis methods to DAS-protected 2010 Census data, we find that the protected data are not of sufficient quality for redistricting purposes. We demonstrate that the injected noise makes it impossible for states to accurately comply with the One Person, One Vote principle. Our analysis finds that the DAS-protected data are biased against certain areas, depending on voter turnout and partisan and racial composition, and that these biases lead to large and unpredictable errors in the analysis of partisan and racial gerrymanders. Finally, we show that the DAS algorithm does not universally protect respondent privacy. Based on the names and addresses of registered voters, we are able to predict their race as accurately using the DAS-protected data as when using the 2010 Census data. Despite this, the DAS-protected data can still inaccurately estimate the number of majority-minority districts. We conclude with recommendations for how the Census Bureau should proceed with privacy protection for the 2020 Census.
Random sampling of graph partitions under constraints has become a popular tool for evaluating legislative redistricting plans. Analysts detect partisan gerrymandering by comparing a proposed redistricting plan with an ensemble of sampled alternative plans. For successful application, sampling methods must scale to large maps with many districts, incorporate realistic legal constraints, and accurately and efficiently sample from a selected target distribution. Unfortunately, most existing methods struggle in at least one of these three areas. We present a new Sequential Monte Carlo (SMC) algorithm that draws representative redistricting plans from a realistic target distribution of choice. Because it samples directly, the SMC algorithm can efficiently explore the relevant space of redistricting plans better than the existing Markov chain Monte Carlo algorithms that yield dependent samples. Our algorithm can simultaneously incorporate several constraints commonly imposed in real-world redistricting problems, including equal population, compactness, and preservation of administrative boundaries. We validate the accuracy of the proposed algorithm by using a small map where all redistricting plans can be enumerated. We then apply the SMC algorithm to evaluate the partisan implications of several maps submitted by relevant parties in a recent high-profile redistricting case in the state of Pennsylvania. We find that the proposed algorithm is roughly 40 times more efficient in sampling from the target distribution than a state-of-the-art MCMC algorithm. Open-source software is available for implementing the proposed methodology.
The global infrastructure of the Web, designed as an open and transparent system, has a significant impact on our society. However, algorithmic systems of corporate entities that neglect those principles increasingly populated the Web. Typical representatives of these algorithmic systems are recommender systems that influence our society both on a scale of global politics and during mundane shopping decisions. Recently, such recommender systems have come under critique for how they may strengthen existing or even generate new kinds of biases. To this end, designers and engineers are increasingly urged to make the functioning and purpose of recommender systems more transparent. Our research relates to the discourse of algorithm awareness, that reconsiders the role of algorithm visibility in interface design. We conducted online experiments with 105 participants using MTurk for the recommender system Recoin, a gadget for Wikidata. In these experiments, we presented users with one of a set of three different designs of Recoins user interface, each of them exhibiting a varying degree of explainability and interactivity. Our findings include a positive correlation between comprehension of and trust in an algorithmic system in our interactive redesign. However, our results are not conclusive yet, and suggest that the measures of comprehension, fairness, accuracy and trust are not yet exhaustive for the empirical study of algorithm awareness. Our qualitative insights provide a first indication for further measures. Our study participants, for example, were less concerned with the details of understanding an algorithmic calculation than with who or what is judging the result of the algorithm.
COVID-19s impact has surpassed from personal and global health to our social life. In terms of digital presence, it is speculated that during pandemic, there has been a significant rise in cyberbullying. In this paper, we have examined the hypothesis of whether cyberbullying and reporting of such incidents have increased in recent times. To evaluate the speculations, we collected cyberbullying related public tweets (N=454,046) posted between January 1st, 2020 -- June 7th, 2020. A simple visual frequentist analysis ignores serial correlation and does not depict changepoints as such. To address correlation and a relatively small number of time points, Bayesian estimation of the trends is proposed for the collected data via an autoregressive Poisson model. We show that this new Bayesian method detailed in this paper can clearly show the upward trend on cyberbullying-related tweets since mid-March 2020. However, this evidence itself does not signify a rise in cyberbullying but shows a correlation of the crisis with the discussion of such incidents by individuals. Our work emphasizes a critical issue of cyberbullying and how a global crisis impacts social media abuse and provides a trend analysis model that can be utilized for social media data analysis in general.
After the peace agreement of 2016 with FARC, the killings of social leaders have emerged as an important post-conflict challenge for Colombia. We present a data analysis based on official records obtained from the Colombian General Attorneys Office spanning the time period from 2012 to 2017. The results of the analysis show a drastic increase in the officially recorded number of killings of democratically elected leaders of community organizations, in particular those belonging to Juntas de Accion Comunal [Community Action Boards]. These are important entities that have been part of the Colombian democratic apparatus since 1958, and enable communities to advocate for their needs. We also describe how the data analysis guided a journalistic investigation that was motivated by the Colombian governments denial of the systematic nature of social leaders killings.
Voting is a means to agree on a collective decision based on available choices (e.g., candidates), where participants (voters) agree to abide by their outcome. To improve some features of e-voting, decentralized solutions based on a blockchain can be employed, where the blockchain represents a public bulletin board that in contrast to a centralized bulletin board provides $100%$ availability and censorship resistance. A blockchain ensures that all entities in the voting system have the same view of the actions made by others due to its immutable and append-only log. The existing blockchain-based boardroom voting solution called Open Voting Network (OVN) provides the privacy of votes and perfect ballot secrecy, but it supports only two candidates. We present BBB-Voting, an equivalent blockchain-based approach for decentralized voting than OVN, but in contrast to it, BBB-Voting supports 1-out-of-$k$ choices and provides a fault tolerance mechanism that enables recovery from stalling participants. We provide a cost-optimized implementation using Ethereum, which we compare with OVN and show that our work decreases the costs for voters by $13.5%$ in terms of gas consumption. Next, we outline the extension of our implementation scaling to magnitudes higher number of participants than in a boardroom voting, while preserving the costs paid by the authority and participants -- we made proof-of-concept experiments with up to 1000 participants.