EFSG: Evolutionary Fooling Sentences Generator

85 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Marco Di Giovanni

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Marco Di Giovanni - Marco Brambilla

الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Large pre-trained language representation models (LMs) have recently collected a huge number of successes in many NLP tasks. In 2018 BERT, and later its successors (e.g. RoBERTa), obtained state-of-the-art results in classical benchmark tasks, such as GLUE benchmark. After that, works about adversarial attacks have been published to test their generalization proprieties and robustness. In this work, we design Evolutionary Fooling Sentences Generator (EFSG), a model- and task-agnostic adversarial attack algorithm built using an evolutionary approach to generate false-positive sentences for binary classification tasks. We successfully apply EFSG to CoLA and MRPC tasks, on BERT and RoBERTa, comparing performances. Results prove the presence of weak spots in state-of-the-art LMs. We finally test adversarial training as a data augmentation defence approach against EFSG, obtaining stronger improved models with no loss of accuracy when tested on the original datasets.

قيم البحث

109 - Lei Xu , Ivan Ramirez , Kalyan Veeramachaneni 2020

Most adversarial attack methods that are designed to deceive a text classifier change the text classifiers prediction by modifying a few words or characters. Few try to attack classifiers by rewriting a whole sentence, due to the difficulties inheren t in sentence-level rephrasing as well as the problem of setting the criteria for legitimate rewriting. In this paper, we explore the problem of creating adversarial examples with sentence-level rewriting. We design a new sampling method, named ParaphraseSampler, to efficiently rewrite the original sentence in multiple ways. Then we propose a new criteria for modification, called a sentence-level threaten model. This criteria allows for both word- and sentence-level changes, and can be adjusted independently in two dimensions: semantic similarity and grammatical quality. Experimental results show that many of these rewritten sentences are misclassified by the classifier. On all 6 datasets, our ParaphraseSampler achieves a better attack success rate than our baseline.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

Decontextualization: Making Sentences Stand-Alone

98 - Eunsol Choi , Jennimaria Palomaki , Matthew Lamm 2021

Models for question answering, dialogue agents, and summarization often interpret the meaning of a sentence in a rich context and use that meaning in a new context. Taking excerpts of text can be problematic, as key pieces may not be explicit in a lo cal window. We isolate and define the problem of sentence decontextualization: taking a sentence together with its context and rewriting it to be interpretable out of context, while preserving its meaning. We describe an annotation procedure, collect data on the Wikipedia corpus, and use the data to train models to automatically decontextualize sentences. We present preliminary studies that show the value of sentence decontextualization in a user facing task, and as preprocessing for systems that perform document understanding. We argue that decontextualization is an important subtask in many downstream applications, and that the definitions and resources provided can benefit tasks that operate on sentences that occur in a richer context.

الحساب واللغة الذكاء الاصطناعي

ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences

99 - Yanjun Gao , Ting-hao Huang , Rebecca J. Passonneau 2021

Atomic clauses are fundamental text units for understanding complex sentences. Identifying the atomic sentences within complex sentences is important for applications such as summarization, argument mining, discourse analysis, discourse parsing, and question answering. Previous work mainly relies on rule-based methods dependent on parsing. We propose a new task to decompose each complex sentence into simple sentences derived from the tensed clauses in the source, and a novel problem formulation as a graph edit task. Our neural model learns to Accept, Break, Copy or Drop elements of a graph that combines word adjacency and grammatical dependencies. The full processing pipeline includes modules for graph construction, graph editing, and sentence generation from the output graph. We introduce DeSSE, a new dataset designed to train and evaluate complex sentence decomposition, and MinWiki, a subset of MinWikiSplit. ABCD achieves comparable performance as two parsing baselines on MinWiki. On DeSSE, which has a more even balance of complex sentence types, our model achieves higher accuracy on the number of atomic sentences than an encoder-decoder baseline. Results include a detailed error analysis.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

On a Topic Model for Sentences

321 - Georgios Balikas , Massih-Reza Amini , Marianne Clausel 2016

Probabilistic topic models are generative models that describe the content of documents by discovering the latent topics underlying them. However, the structure of the textual input, and for instance the grouping of words in coherent text spans such as sentences, contains much information which is generally lost with these models. In this paper, we propose sentenceLDA, an extension of LDA whose goal is to overcome this limitation by incorporating the structure of the text in the generative and inference processes. We illustrate the advantages of sentenceLDA by comparing it with LDA using both intrinsic (perplexity) and extrinsic (text classification) evaluation tasks on different text collections.

الحساب واللغة استرجاع المعلومات التعلم الآلي

DefSent: Sentence Embeddings using Definition Sentences

131 - Hayato Tsukagoshi , Ryohei Sasano , Koichi Takeda 2021

Sentence embedding methods using natural language inference (NLI) datasets have been successfully applied to various tasks. However, these methods are only available for limited languages due to relying heavily on the large NLI datasets. In this pape r, we propose DefSent, a sentence embedding method that uses definition sentences from a word dictionary, which performs comparably on unsupervised semantics textual similarity (STS) tasks and slightly better on SentEval tasks than conventional methods. Since dictionaries are available for many languages, DefSent is more broadly applicable than methods using NLI datasets without constructing additional datasets. We demonstrate that DefSent performs comparably on unsupervised semantics textual similarity (STS) tasks and slightly better on SentEval tasks to the methods using large NLI datasets. Our code is publicly available at https://github.com/hpprc/defsent .

الحساب واللغة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة العربية الدولية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

EFSG: Evolutionary Fooling Sentences Generator

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً