Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Sample-efficient Linguistic Generalizations through Program Synthesis: Experiments with Phonology Problems

تعميمات لغوية فعالة للعينة من خلال توليف البرامج: تجارب مع مشاكل علم الصوتيات

317 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

sample-efficient linguistic generalizations linguistic generalizations experiments with phonology تعميمات لغوية فعالة للعينة التعميمات اللغوية تجارب مع علم الصويا صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Neural models excel at extracting statistical patterns from large amounts of data, but struggle to learn patterns or reason about language from only a few examples. In this paper, we ask: Can we learn explicit rules that generalize well from only a few examples? We explore this question using program synthesis. We develop a synthesis model to learn phonology rules as programs in a domain-specific language. We test the ability of our models to generalize from few training examples using our new dataset of problems from the Linguistics Olympiad, a challenging set of tasks that require strong linguistic reasoning ability. In addition to being highly sample-efficient, our approach generates human-readable programs, and allows control over the generalizability of the learnt programs.

References used

https://aclanthology.org/

rate research

Automatically Exposing Problems with Neural Dialog Models

613 - Association for Computation Linguistics 2021 مقالة

Neural dialog models are known to suffer from problems such as generating unsafe and inconsistent responses. Even though these problems are crucial and prevalent, they are mostly manually identified by model designers through interactions. Recently, some research instructs crowdworkers to goad the bots into triggering such problems. However, humans leverage superficial clues such as hate speech, while leaving systematic problems undercover. In this paper, we propose two methods including reinforcement learning to automatically trigger a dialog model into generating problematic responses. We show the effect of our methods in exposing safety and contradiction issues with state-of-the-art dialog models.

neural dialog models neural dialog dialog models نماذج الحوار العصبي مربع الحوار العصبي نماذج الحوار صناعة حمض الفوسفور المزيد..

Efficient Machine Translation with Model Pruning and Quantization

591 - Association for Computation Linguistics 2021 مقالة

We participated in all tracks of the WMT 2021 efficient machine translation task: single-core CPU, multi-core CPU, and GPU hardware with throughput and latency conditions. Our submissions combine several efficiency strategies: knowledge distillation, a simpler simple recurrent unit (SSRU) decoder with one or two layers, lexical shortlists, smaller numerical formats, and pruning. For the CPU track, we used quantized 8-bit models. For the GPU track, we experimented with FP16 and 8-bit integers in tensorcores. Some of our submissions optimize for size via 4-bit log quantization and omitting a lexical shortlist. We have extended pruning to more parts of the network, emphasizing component- and block-level pruning that actually improves speed unlike coefficient-wise pruning.

efficient machine translation فعالة الترجمة الآلية صناعة حمض الفوسفور

To Block or not to Block: Experiments with Machine Learning for News Comment Moderation

656 - Association for Computation Linguistics 2021 مقالة

Today, news media organizations regularly engage with readers by enabling them to comment on news articles. This creates the need for comment moderation and removal of disallowed comments -- a time-consuming task often performed by human moderators. In this paper we approach the problem of automatic news comment moderation as classification of comments into blocked and not blocked categories. We construct a novel dataset of annotated English comments, experiment with cross-lingual transfer of comment labels and evaluate several machine learning models on datasets of Croatian and Estonian news comments. Team name: SuperAdmin; Challenge: Detection of blocked comments; Tools/models: CroSloEn BERT, FinEst BERT, 24Sata comment dataset, Ekspress comment dataset.

أخبار سلوفينية كوربوس comment moderation comment حاجز تعليق الاعتدال تعليق صناعة حمض الفوسفور المزيد..

Country-level Arabic Dialect Identification using RNNs with and without Linguistic Features

1062 - Association for Computation Linguistics 2021 مقالة

This work investigates the value of augmenting recurrent neural networks with feature engineering for the Second Nuanced Arabic Dialect Identification (NADI) Subtask 1.2: Country-level DA identification. We compare the performance of a simple word-le vel LSTM using pretrained embeddings with one enhanced using feature embeddings for engineered linguistic features. Our results show that the addition of explicit features to the LSTM is detrimental to performance. We attribute this performance loss to the bivalency of some linguistic items in some text, ubiquity of topics, and participant mobility.

منطقيا عربي country-level arabic dialect لهجة عربية على مستوى البلد صناعة حمض الفوسفور

Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica

947 - Association for Computation Linguistics 2021 مقالة

People convey their intention and attitude through linguistic styles of the text that they write. In this study, we investigate lexicon usages across styles throughout two lenses: human perception and machine word importance, since words differ in th e strength of the stylistic cues that they provide. To collect labels of human perception, we curate a new dataset, Hummingbird, on top of benchmarking style datasets. We have crowd workers highlight the representative words in the text that makes them think the text has the following styles: politeness, sentiment, offensiveness, and five emotion types. We then compare these human word labels with word importance derived from a popular fine-tuned style classifier like BERT. Our results show that the BERT often finds content words not relevant to the target style as important words used in style prediction, but humans do not perceive the same way even though for some styles (e.g., positive sentiment and joy) human- and machine-identified words share significant overlap for some styles.

التفكير الشديد learn styles يتعلم أنماط صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Sample-efficient Linguistic Generalizations through Program Synthesis: Experiments with Phonology Problems

تعميمات لغوية فعالة للعينة من خلال توليف البرامج: تجارب مع مشاكل علم الصوتيات

Ask ChatGPT about the research

Read More

suggested questions