Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Finding BERT's Idiomatic Key

العثور على المفتاح الاصطلاحية بيرت

1036 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Sentence embeddings encode information relating to the usage of idioms in a sentence. This paper reports a set of experiments that combine a probing methodology with input masking to analyse where in a sentence this idiomatic information is taken from, and what form it takes. Our results indicate that BERT's idiomatic key is primarily found within an idiomatic expression, but also draws on information from the surrounding context. Also, BERT can distinguish between the disruption in a sentence caused by words missing and the incongruity caused by idiomatic usage.

References used

https://aclanthology.org/

rate research

PIE: A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing

604 - Association for Computation Linguistics 2021 مقالة

Idiomatic expressions (IE) play an important role in natural language, and have long been a pain in the neck'' for NLP systems. Despite this, text generation tasks related to IEs remain largely under-explored. In this paper, we propose two new tasks of idiomatic sentence generation and paraphrasing to fill this research gap. We introduce a curated dataset of 823 IEs, and a parallel corpus with sentences containing them and the same sentences where the IEs were replaced by their literal paraphrases as the primary resource for our tasks. We benchmark existing deep learning models, which have state-of-the-art performance on related tasks using automated and manual evaluation with our dataset to inspire further research on our proposed tasks. By establishing baseline models, we pave the way for more comprehensive and accurate modeling of IEs, both for generation and paraphrasing.

parallel idiomatic expression idiomatic expression corpus idiomatic sentence generation التعبير الاصطلاحي الموازي تعبير الاصطلاح توليد الجملة الاصطلاحية صناعة حمض الفوسفور المزيد..

Finding Pragmatic Differences Between Disciplines

595 - Association for Computation Linguistics 2021 مقالة

Scholarly documents have a great degree of variation, both in terms of content (semantics) and structure (pragmatics). Prior work in scholarly document understanding emphasizes semantics through document summarization and corpus topic modeling but te nds to omit pragmatics such as document organization and flow. Using a corpus of scholarly documents across 19 disciplines and state-of-the-art language modeling techniques, we learn a fixed set of domain-agnostic descriptors for document sections and retrofit'' the corpus to these descriptors (also referred to as normalization''). Then, we analyze the position and ordering of these descriptors across documents to understand the relationship between discipline and structure. We report within-discipline structural archetypes, variability, and between-discipline comparisons, supporting the hypothesis that scholarly communities, despite their size, diversity, and breadth, share similar avenues for expressing their work. Our findings lay the foundation for future work in assessing research quality, domain style transfer, and further pragmatic analysis.

finding pragmatic differences pragmatic differences differences between disciplines العثور على الاختلافات العميمة الاختلافات العملية الاختلافات بين التخصصات صناعة حمض الفوسفور المزيد..

Low-Complexity Probing via Finding Subnetworks

634 - Association for Computation Linguistics 2021 مقالة

The dominant approach in probing neural networks for linguistic properties is to train a new shallow multi-layer perceptron (MLP) on top of the model's internal representations. This approach can detect properties encoded in the model, but at the cos t of adding new parameters that may learn the task directly. We instead propose a subtractive pruning-based probe, where we find an existing subnetwork that performs the linguistic task of interest. Compared to an MLP, the subnetwork probe achieves both higher accuracy on pre-trained models and lower accuracy on random models, so it is both better at finding properties of interest and worse at learning on its own. Next, by varying the complexity of each probe, we show that subnetwork probing Pareto-dominates MLP probing in that it achieves higher accuracy given any budget of probe complexity. Finally, we analyze the resulting subnetworks across various tasks to locate where each task is encoded, and we find that lower-level tasks are captured in lower layers, reproducing similar findings in past work.

pareto-dominates mlp probing probing pareto-dominates mlp low-complexity probing باريتو يهيمن على التحقيق MLP التحقيق في باريتو المهيمنة MLP التحقيق منخفض التعقيد صناعة حمض الفوسفور المزيد..

Finding a Balanced Degree of Automation for Summary Evaluation

575 - Association for Computation Linguistics 2021 مقالة

Human evaluation for summarization tasks is reliable but brings in issues of reproducibility and high costs. Automatic metrics are cheap and reproducible but sometimes poorly correlated with human judgment. In this work, we propose flexible semiautom atic to automatic summary evaluation metrics, following the Pyramid human evaluation method. Semi-automatic Lite2Pyramid retains the reusable human-labeled Summary Content Units (SCUs) for reference(s) but replaces the manual work of judging SCUs' presence in system summaries with a natural language inference (NLI) model. Fully automatic Lite3Pyramid further substitutes SCUs with automatically extracted Semantic Triplet Units (STUs) via a semantic role labeling (SRL) model. Finally, we propose in-between metrics, Lite2.xPyramid, where we use a simple regressor to predict how well the STUs can simulate SCUs and retain SCUs that are more difficult to simulate, which provides a smooth transition and balance between automation and manual evaluation. Comparing to 15 existing metrics, we evaluate human-metric correlations on 3 existing meta-evaluation datasets and our newly collected PyrXSum (with 100/10 XSum examples/systems). It shows that Lite2Pyramid consistently has the best summary-level correlations; Lite3Pyramid works better than or comparable to other automatic metrics; Lite2.xPyramid trades off small correlation drops for larger manual effort reduction, which can reduce costs for future data collection.

balanced degree finding a balanced summary content units درجة متوازنة. العثور على متوازن ملخص وحدات المحتوى صناعة حمض الفوسفور المزيد..

Hopeful NLP@LT-EDI-EACL2021: Finding Hope in YouTube Comment Section

539 - Association for Computation Linguistics 2021 مقالة

The proliferation of Hate Speech and misinformation in social media is fast becoming a menace to society. In compliment, the dissemination of hate-diffusing, promising and anti-oppressive messages become a unique alternative. Unfortunately, due to it s complex nature as well as the relatively limited manifestation in comparison to hostile and neutral content, the identification of Hope Speech becomes a challenge. This work revolves around the detection of Hope Speech in Youtube comments, for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion. We achieve an f-score of 0.93, ranking 1st on the leaderboard for English comments.

youtube comment section hopeful nlp finding hope موقع التعليق يوتيوب الأمل NLP. العثور على الأمل صناعة حمض الفوسفور المزيد..

Finding BERT's Idiomatic Key

العثور على المفتاح الاصطلاحية بيرت

Ask ChatGPT about the research

Read More

suggested questions