Who Tags What? An Analysis Framework

360 0 0.0 ( 0 )

Download Cite

Added by Mahashweta Das

Publication date 2012

fields Informatics Engineering

and research's language is English

Authors Mahashweta Das - Saravanan Thirumuruganathan - Sihem Amer-Yahia

Databases

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The rise of Web 2.0 is signaled by sites such as Flickr, del.icio.us, and YouTube, and social tagging is essential to their success. A typical tagging action involves three components, user, item (e.g., photos in Flickr), and tags (i.e., words or phrases). Analyzing how tags are assigned by certain users to certain items has important implications in helping users search for desired information. In this paper, we explore common analysis tasks and propose a dual mining framework for social tagging behavior mining. This framework is centered around two opposing measures, similarity and diversity, being applied to one or more tagging components, and therefore enables a wide range of analysis scenarios such as characterizing similar users tagging diverse items with similar tags, or diverse users tagging similar items with diverse tags, etc. By adopting different concrete measures for similarity and diversity in the framework, we show that a wide range of concrete analysis problems can be defined and they are NP-Complete in general. We design efficient algorithms for solving many of those problems and demonstrate, through comprehensive experiments over real data, that our algorithms significantly out-perform the exact brute-force approach without compromising analysis result quality.

rate research

Who Gets What, According to Whom? An Analysis of Fairness Perceptions in Service Allocation

177 - Jacqueline Hannan , Huei-Yen Winnie Chen , Kenneth Joseph 2021

Algorithmic fairness research has traditionally been linked to the disciplines of philosophy, ethics, and economics, where notions of fairness are prescriptive and seek objectivity. Increasingly, however, scholars are turning to the study of what different people perceive to be fair, and how these perceptions can or should help to shape the design of machine learning, particularly in the policy realm. The present work experimentally explores five novel research questions at the intersection of the Who, What, and How of fairness perceptions. Specifically, we present the results of a multi-factor conjoint analysis study that quantifies the effects of the specific context in which a question is asked, the framing of the given question, and who is answering it. Our results broadly suggest that the Who and What, at least, matter in ways that are 1) not easily explained by any one theoretical perspective, 2) have critical implications for how perceptions of fairness should be measured and/or integrated into algorithmic decision-making systems.

Computers and Society

Augmenting Decision Making via Interactive What-If Analysis

226 - Sneha Gathani , Madelon Hulsebos , James Gale 2021

The fundamental goal of business data analysis is to improve business decisions using data. Business users such as sales, marketing, product, or operations managers often make decisions to achieve key performance indicator (KPI) goals such as increasing customer retention, decreasing cost, and increasing sales. To discover the relationship between data attributes hypothesized to be drivers and those corresponding to KPIs of interest, business users currently need to perform lengthy exploratory analyses, considering multitudes of combinations and scenarios, slicing, dicing, and transforming the data accordingly. For example, analyzing customer retention across quarters of the year or suggesting optimal media channels across strata of customers. However, the increasing complexity of datasets combined with the cognitive limitations of humans makes it challenging to carry over multiple hypotheses, even for simple datasets. Therefore mentally performing such analyses is hard. Existing commercial tools either provide partial solutions whose effectiveness remains unclear or fail to cater to business users. Here we argue for four functionalities that we believe are necessary to enable business users to interactively learn and reason about the relationships (functions) between sets of data attributes, facilitating data-driven decision making. We implement these functionalities in SystemD, an interactive visual analysis system enabling business users to experiment with the data by asking what-if questions. We evaluate the system through three business use cases: marketing mix modeling analysis, customer retention analysis, and deal closing analysis, and report on feedback from multiple business users. Overall, business users find SystemD intuitive and useful for quick testing and validation of their hypotheses around interested KPI as well as in making effective and fast data-driven decisions.

Databases Human-Computer Interaction Machine Learning

KGClean: An Embedding Powered Knowledge Graph Cleaning Framework

243 - Congcong Ge , Yunjun Gao , Honghui Weng 2020

The quality assurance of the knowledge graph is a prerequisite for various knowledge-driven applications. We propose KGClean, a novel cleaning framework powered by knowledge graph embedding, to detect and repair the heterogeneous dirty data. In contrast to previous approaches that either focus on filling missing data or clean errors violated limited rules, KGClean enables (i) cleaning both missing data and other erroneous values, and (ii) mining potential rules automatically, which expands the coverage of error detecting. KGClean first learns data representations by TransGAT, an effective knowledge graph embedding model, which gathers the neighborhood information of each data and incorporates the interactions among data for casting data to continuous vector spaces with rich semantics. KGClean integrates an active learning-based classification model, which identifies errors with a small seed of labels. KGClean utilizes an efficient PRO-repair strategy to repair errors using a novel concept of propagation power. Extensive experiments on four typical knowledge graphs demonstrate the effectiveness of KGClean in practice.

Databases

TrQuery: An Embedding-based Framework for Recommanding SPARQL Queries

221 - Lijing Zhang , Xiaowang Zhang , Zhiyong Feng 2018

In this paper, we present an embedding-based framework (TrQuery) for recommending solutions of a SPARQL query, including approximate solutions when exact querying solutions are not available due to incompleteness or inconsistencies of real-world RDF data. Within this framework, embedding is applied to score solutions together with edit distance so that we could obtain more fine-grained recommendations than those recommendations via edit distance. For instance, graphs of two querying solutions with a similar structure can be distinguished in our proposed framework while the edit distance depending on structural difference becomes unable. To this end, we propose a novel score model built on vector space generated in embedding system to compute the similarity between an approximate subgraph matching and a whole graph matching. Finally, we evaluate our approach on large RDF datasets DBpedia and YAGO, and experimental results show that TrQuery exhibits an excellent behavior in terms of both effectiveness and efficiency.

Databases Artificial Intelligence

CIAO: An Optimization Framework for Client-Assisted Data Loading

62 - Cong Ding , Dixin Tang , Xi Liang 2021

Data loading has been one of the most common performance bottlenecks for many big data applications, especially when they are running on inefficient human-readable formats, such as JSON or CSV. Parsing, validating, integrity checking and data structure maintenance are all computationally expensive steps in loading these formats. Regardless of these costs, many records may be filtered later during query evaluation due to highly selective predicates -- resulting in wasted computation. Meanwhile, the computing power of client ends is typically not exploited. Here, we explore investing limited cycles of clients on prefiltering to accelerate data loading and enable data skipping for query execution. In this paper, we present CIAO, a tunable system to enable client cooperation with the server to enable efficient partial loading and data skipping for a given workload. We proposed an efficient algorithm that would select a near-optimal predicate set to push down within a given budget. Moreover, CIAO will address the trade-off between client cost and server savings by setting different budgets for different clients. We implemented CIAO and evaluated its performance on three real-world datasets. Our experimental results show that the system substantially accelerates data loading by up to 21x and query execution by up to 23x and improves end-to-end performance by up to 19x within a budget of 1.0 microseconds latency per record on clients.

Databases