Structured Embedding Models for Grouped Data

151 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Maja Rudolph

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Maja Rudolph - Francisco Ruiz - Susan Athey

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Word embeddings are a powerful approach for analyzing language, and exponential family embeddings (EFE) extend them to other types of data. Here we develop structured exponential family embeddings (S-EFE), a method for discovering embeddings that vary across related groups of data. We study how the word usage of U.S. Congressional speeches varies across states and party affiliation, how words are used differently across sections of the ArXiv, and how the co-purchase patterns of groceries can vary across seasons. Key to the success of our method is that the groups share statistical information. We develop two sharing strategies: hierarchical modeling and amortization. We demonstrate the benefits of this approach in empirical studies of speeches, abstracts, and shopping baskets. We show how S-EFE enables group-specific interpretation of word usage, and outperforms EFE in predicting held-out data.

قيم البحث

88 - Tasnim Mohiuddin , Thanh-Tung Nguyen , Shafiq Joty 2019

We address the problem of speech act recognition (SAR) in asynchronous conversations (forums, emails). Unlike synchronous conversations (e.g., meetings, phone), asynchronous domains lack large labeled datasets to train an effective SAR model. In this paper, we propose methods to effectively leverage abundant unlabeled conversational data and the available labeled data from synchronous domains. We carry out our research in three main steps. First, we introduce a neural architecture based on hierarchical LSTMs and conditional random fields (CRF) for SAR, and show that our method outperforms existing methods when trained on in-domain data only. Second, we improve our initial SAR models by semi-supervised learning in the form of pretrained word embeddings learned from a large unlabeled conversational corpus. Finally, we employ adversarial training to improve the results further by leveraging the labeled data from synchronous domains and by explicitly modeling the distributional shift in two domains.

الحساب واللغة التعلم الآلي التعلم الالي

Diverse Embedding Neural Network Language Models

603 - Kartik Audhkhasi , Abhinav Sethy , Bhuvana Ramabhadran 2014

We propose Diverse Embedding Neural Network (DENN), a novel architecture for language models (LMs). A DENNLM projects the input word history vector onto multiple diverse low-dimensional sub-spaces instead of a single higher-dimensional sub-space as i n conventional feed-forward neural network LMs. We encourage these sub-spaces to be diverse during network training through an augmented loss function. Our language modeling experiments on the Penn Treebank data set show the performance benefit of using a DENNLM.

الحساب واللغة التعلم الآلي الحوسبة العصبية والتطورية

MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

177 - Zhewei Yao , Linjian Ma , Sheng Shen 2021

Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models. However, current approaches either only explore head pruning, which has a limited pruning ratio, or only fo cus on unstructured pruning, which has negligible effects on the real inference time and/or power consumption. To address these challenges, we develop a novel MultiLevel structured Pruning (MLPruning) framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning. We propose using a learnable Top-k threshold, which employs an adaptive regularization to adjust the regularization magnitude adaptively, to select appropriate pruning ratios for different weight matrices. We also propose a two-step pipeline to combine block-wise pruning with head/row pruning to achieve high structured pruning ratios with minimum accuracy degradation. Our empirical results show that for bertbase, with textapprox20% of remaining weights, OURS can achieve an accuracy that is comparable to the full model on QQP/MNLI/squad, with up to textapprox3.69x speedup. Our framework has been open sourced~cite{codebase}.

الحساب واللغة التعلم الآلي

Structured Content Preservation for Unsupervised Text Style Transfer

162 - Youzhi Tian , Zhiting Hu , Zhou Yu 2018

Text style transfer aims to modify the style of a sentence while keeping its content unchanged. Recent style transfer systems often fail to faithfully preserve the content after changing the style. This paper proposes a structured content preserving model that leverages linguistic information in the structured fine-grained supervisions to better preserve the style-independent content during style transfer. In particular, we achieve the goal by devising rich model objectives based on both the sentences lexical information and a language model that conditions on content. The resulting model therefore is encouraged to retain the semantic meaning of the target sentences. We perform extensive experiments that compare our model to other existing approaches in the tasks of sentiment and political slant transfer. Our model achieves significant improvement in terms of both content preservation and style transfer in automatic and human evaluation.

الحساب واللغة التعلم الآلي التعلم الالي

On SkipGram Word Embedding Models with Negative Sampling: Unified Framework and Impact of Noise Distributions

116 - Ziqiao Wang , Yongyi Mao , Hongyu Guo 2020

SkipGram word embedding models with negative sampling, or SGN in short, is an elegant family of word embedding models. In this paper, we formulate a framework for word embedding, referred to as Word-Context Classification (WCC), that generalizes SGN to a wide family of models. The framework, utilizing some noise examples, is justified through a theoretical analysis. The impact of noise distribution on the learning of the WCC embedding models is studied experimentally, suggesting that the best noise distribution is in fact the data distribution, in terms of both the embedding performance and the speed of convergence during training. Along our way, we discover several novel embedding models that outperform the existing WCC models.

الحساب واللغة التعلم الآلي التعلم الالي