ترغب بنشر مسار تعليمي؟ اضغط هنا

Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?

264   0   0.0 ( 0 )
 نشر من قبل Milagros Miceli
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Research in machine learning (ML) has primarily argued that models trained on incomplete or biased datasets can lead to discriminatory outputs. In this commentary, we propose moving the research focus beyond bias-oriented framings by adopting a power-aware perspective to study up ML datasets. This means accounting for historical inequities, labor conditions, and epistemological standpoints inscribed in data. We draw on HCI and CSCW work to support our argument, critically analyze previous research, and point at two co-existing lines of work within our community -- one bias-oriented, the other power-aware. This way, we highlight the need for dialogue and cooperation in three areas: data quality, data work, and data documentation. In the first area, we argue that reducing societal problems to bias misses the context-based nature of data. In the second one, we highlight the corporate forces and market imperatives involved in the labor of data workers that subsequently shape ML datasets. Finally, we propose expanding current transparency-oriented efforts in dataset documentation to reflect the social contexts of data design and production.



قيم البحث

اقرأ أيضاً

173 - Ewan Cameron 2014
In astronomical and cosmological studies one often wishes to infer some properties of an infinite-dimensional field indexed within a finite-dimensional metric space given only a finite collection of noisy observational data. Bayesian inference offers an increasingly-popular strategy to overcome the inherent ill-posedness of this signal reconstruction challenge. However, there remains a great deal of confusion within the astronomical community regarding the appropriate mathematical devices for framing such analyses and the diversity of available computational procedures for recovering posterior functionals. In this brief research note I will attempt to clarify both these issues from an applied statistics perpective, with insights garnered from my post-astronomy experiences as a computational Bayesian / epidemiological geostatistician.
The first responder community has traditionally relied on calls from the public, officially-provided geographic information and maps for coordinating actions on the ground. The ubiquity of social media platforms created an opportunity for near real-t ime sensing of the situation (e.g. unfolding weather events or crises) through volunteered geographic information. In this article, we provide an overview of the design process and features of the Social Media Analytics Reporting Toolkit (SMART), a visual analytics platform developed at Purdue University for providing first responders with real-time situational awareness. We attribute its successful adoption by many first responders to its user-centered design, interactive (geo)visualizations and interactive machine learning, giving users control over analysis.
Culture is core to human civilization, and is essential for human intellectual achievements in social context. Culture also influences how humans work together, perform particular task and overall lifestyle and dealing with other groups of civilizati on. Thus, culture is concerned with establishing shared ideas, particularly those playing a key role in success. Does it impact on how two individuals can work together in achieving certain goals? In this paper, we establish a means to derive cultural association and map it to culturally mediated success. Human interactions with the environment are typically in the form of expressions. Association between culture and behavior produce similar beliefs which lead to common principles and actions, while cultural similarity as a set of common expressions and responses. To measure cultural association among different candidates, we propose the use of a Graphical Association Method (GAM). The behaviors of candidates are captured through series of expressions and represented in the graphical form. The association among corresponding node and core nodes is used for the same. Our approach provides a number of interesting results and promising avenues for future applications.
Algorithmic systems---from rule-based bots to machine learning classifiers---have a long history of supporting the essential work of content moderation and other curation work in peer production projects. From counter-vandalism to task routing, basic machine prediction has allowed open knowledge projects like Wikipedia to scale to the largest encyclopedia in the world, while maintaining quality and consistency. However, conversations about how quality control should work and what role algorithms should play have generally been led by the expert engineers who have the skills and resources to develop and modify these complex algorithmic systems. In this paper, we describe ORES: an algorithmic scoring service that supports real-time scoring of wiki edits using multiple independent classifiers trained on different datasets. ORES decouples several activities that have typically all been performed by engineers: choosing or curating training data, building models to serve predictions, auditing predictions, and developing interfaces or automated agents that act on those predictions. This meta-algorithmic system was designed to open up socio-technical conversations about algorithms in Wikipedia to a broader set of participants. In this paper, we discuss the theoretical mechanisms of social change ORES enables and detail case studies in participatory machine learning around ORES from the 5 years since its deployment.
To ensure accountability and mitigate harm, it is critical that diverse stakeholders can interrogate black-box automated systems and find information that is understandable, relevant, and useful to them. In this paper, we eschew prior expertise- and role-based categorizations of interpretability stakeholders in favor of a more granular framework that decouples stakeholders knowledge from their interpretability needs. We characterize stakeholders by their formal, instrumental, and personal knowledge and how it manifests in the contexts of machine learning, the data domain, and the general milieu. We additionally distill a hierarchical typology of stakeholder needs that distinguishes higher-level domain goals from lower-level interpretability tasks. In assessing the descriptive, evaluative, and generative powers of our framework, we find our more nuanced treatment of stakeholders reveals gaps and opportunities in the interpretability literature, adds precision to the design and comparison of user studies, and facilitates a more reflexive approach to conducting this research.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا