Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Emo, Love, and God: Making Sense of Urban Dictionary, a Crowd-Sourced Online Dictionary

62 0 0.0 ( 0 )

Download Cite

Added by Taha Yasseri

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Dong Nguyen - Barbara McGillivray - Taha Yasseri

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The Internet facilitates large-scale collaborative projects and the emergence of Web 2.0 platforms, where producers and consumers of content unify, has drastically changed the information market. On the one hand, the promise of the wisdom of the crowd has inspired successful projects such as Wikipedia, which has become the primary source of crowd-based information in many languages. On the other hand, the decentralized and often un-monitored environment of such projects may make them susceptible to low quality content. In this work, we focus on Urban Dictionary, a crowd-sourced online dictionary. We combine computational methods with qualitative annotation and shed light on the overall features of Urban Dictionary in terms of growth, coverage and types of content. We measure a high presence of opinion-focused entries, as opposed to the meaning-focused entries that we expect from traditional dictionaries. Furthermore, Urban Dictionary covers many informal, unfamiliar words as well as proper nouns. Urban Dictionary also contains offensive content, but highly offensive content tends to receive lower scores through the dictionarys voting system. The low threshold to include new material in Urban Dictionary enables quick recording of new words and new meanings, but the resulting heterogeneous content can pose challenges in using Urban Dictionary as a source to study language innovation.

rate research

FEWS: Large-Scale, Low-Shot Word Sense Disambiguation with the Dictionary

80 - Terra Blevins , Mandar Joshi , 2021

Current models for Word Sense Disambiguation (WSD) struggle to disambiguate rare senses, despite reaching human performance on global WSD metrics. This stems from a lack of data for both modeling and evaluating rare senses in existing WSD datasets. In this paper, we introduce FEWS (Few-shot Examples of Word Senses), a new low-shot WSD dataset automatically extracted from example sentences in Wiktionary. FEWS has high sense coverage across different natural language domains and provides: (1) a large training set that covers many more senses than previous datasets and (2) a comprehensive evaluation set containing few- and zero-shot examples of a wide variety of senses. We establish baselines on FEWS with knowledge-based and neural WSD approaches and present transfer learning experiments demonstrating that models additionally trained with FEWS better capture rare senses in existing WSD datasets. Finally, we find humans outperform the best baseline models on FEWS, indicating that FEWS will support significant future work on low-shot WSD.

Computation and Language

Making Online Communities Better: A Taxonomy of Community Values on Reddit

155 - Galen Weld , Amy X. Zhang , Tim Althoff 2021

Many researchers studying online social communities seek to make such communities better. However, understanding what better means is challenging, due to the divergent opinions of community members, and the multitude of possible community values which often conflict with one another. Community members own values for their communities are not well understood, and how these values align with one another is an open question. Previous research has mostly focused on specific and comparatively well-defined harms within online communities, such as harassment, rule-breaking, and misinformation. In this work, we ask 39 community members on reddit to describe their values for their communities. We gather 301 responses in members own words, spanning 125 unique communities, and use iterative categorization to produce a taxonomy of 29 different community values across 9 major categories. We find that members value a broad range of topics ranging from technical features to the diversity of the community, and most frequently prioritize content quality. We identify important understudied topics such as content quality and community size, highlight where values conflict with one another, and call for research into governance methods for communities that protect vulnerable members.

Human-Computer Interaction Computers and Society Social and Information Networks

Restoring Hebrew Diacritics Without a Dictionary

55 - Elazar Gershuni , Yuval Pinter 2021

We demonstrate that it is feasible to diacritize Hebrew script without any human-curated resources other than plain diacritized text. We present NAKDIMON, a two-layer character level LSTM, that performs on par with much more complicated curation-dependent systems, across a diverse array of modern Hebrew sources.

Computation and Language

NOODL: Provable Online Dictionary Learning and Sparse Coding

105 - Sirisha Rambhatla , Xingguo Li , 2019

We consider the dictionary learning problem, where the aim is to model the given data as a linear combination of a few columns of a matrix known as a dictionary, where the sparse weights forming the linear combination are known as coefficients. Since the dictionary and coefficients, parameterizing the linear model are unknown, the corresponding optimization is inherently non-convex. This was a major challenge until recently, when provable algorithms for dictionary learning were proposed. Yet, these provide guarantees only on the recovery of the dictionary, without explicit recovery guarantees on the coefficients. Moreover, any estimation error in the dictionary adversely impacts the ability to successfully localize and estimate the coefficients. This potentially limits the utility of existing provable dictionary learning methods in applications where coefficient recovery is of interest. To this end, we develop NOODL: a simple Neurally plausible alternating Optimization-based Online Dictionary Learning algorithm, which recovers both the dictionary and coefficients exactly at a geometric rate, when initialized appropriately. Our algorithm, NOODL, is also scalable and amenable for large scale distributed implementations in neural architectures, by which we mean that it only involves simple linear and non-linear operations. Finally, we corroborate these theoretical results via experimental evaluation of the proposed algorithm with the current state-of-the-art techniques. Keywords: dictionary learning, provable dictionary learning, online dictionary learning, non-convex, sparse coding, support recovery, iterative hard thresholding, matrix factorization, neural architectures, neural networks, noodl, sparse representations, sparse signal processing.

Machine Learning Neural and Evolutionary Computing Machine Learning

Making Sense of Word Embeddings

70 - Maria Pelevina , Nikolay Arefyev , Chris Biemann 2017

We present a simple yet effective approach for learning word sense embeddings. In contrast to existing techniques, which either directly learn sense representations from corpora or rely on sense inventories from lexical resources, our approach can induce a sense inventory from existing word embeddings via clustering of ego-networks of related words. An integrated WSD mechanism enables labeling of words in context with learned sense vectors, which gives rise to downstream applications. Experiments show that the performance of our method is comparable to state-of-the-art unsupervised WSD systems.

Computation and Language

comments

Fetching comments

Syrian International University for Science and Technology

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Emo, Love, and God: Making Sense of Urban Dictionary, a Crowd-Sourced Online Dictionary

Ask ChatGPT about the research

No Arabic abstract

Read More