Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity

البحث عن مستندات قانونية على مستوى الفقرة: أتمتة توليد التسمية واستخدام قناع الاهتمام الموسع لتعزيز النماذج العصبية من التشابه الدلالي

686 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

automating label generation boosting neural models automating label أتمتة توليد التسمية تعزيز النماذج العصبية أتمتة التسمية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

البحث عن وثائق قانونية هي مهمة متخصصة لاسترجاع المعلومات ذات الصلة لمستخدمي الخبراء (المحامين ومساعدتهم) وللمستخدمين غير الخبراء. من خلال البحث في قرارات المحكمة السابقة (الحالات)، يمكن للمستخدم إعداد التفكير القانوني بشكل أفضل من حالة جديدة. القدرة على البحث باستخدام تقطيع نص لغة طبيعية بدلا من استعلام مزيد من الاستعلام الاصطناعي قد يساعد في منع مشكلات صياغة الاستعلام. أيضا، إذا كان التشابه الدلالي قد يكون على غرار المطابقات المعجمية الدقيقة، فيمكن العثور على نتائج أكثر صلة حتى لو كانت شروط الاستعلام لا تتطابق تماما. بالنسبة لهذا المجال، صاغنا مهمة لمقارنة الطرق المختلفة لنمذجة التشابه الدلالي على مستوى الفقرة، باستخدام النظم العصبية وغير العصبية. قارنا أنظمة تشفير الاستعلام وفقرات مجموعة البحث كمنتجات، مما يتيح استخدام تشابه التجميل لتحقيق تصنيف النتائج. بعد بناء مجموعة بيانات ألمانية للحالات والنظام الأساسي من سويسرا، واستخراج الاستشهادات من الحالات إلى النظام الأساسي، قمنا بتطوير خوارزمية لتقدير التشابه الدلالي على مستوى الفقرة، باستخدام طريقة التشابه القائمة على الرابط. عند تقييم الأنظمة المختلفة بهذه الطريقة، نجد أن النمذجة الدلالية التشابه بواسطة النظم العصبية يمكن أن يتم تعزيز قناع اهتمام ممتد يروي الضوضاء في المدخلات.

Searching for legal documents is a specialized Information Retrieval task that is relevant for expert users (lawyers and their assistants) and for non-expert users. By searching previous court decisions (cases), a user can better prepare the legal reasoning of a new case. Being able to search using a natural language text snippet instead of a more artificial query could help to prevent query formulation issues. Also, if semantic similarity could be modeled beyond exact lexical matches, more relevant results can be found even if the query terms don't match exactly. For this domain, we formulated a task to compare different ways of modeling semantic similarity at paragraph level, using neural and non-neural systems. We compared systems that encode the query and the search collection paragraphs as vectors, enabling the use of cosine similarity for results ranking. After building a German dataset for cases and statutes from Switzerland, and extracting citations from cases to statutes, we developed an algorithm for estimating semantic similarity at paragraph level, using a link-based similarity method. When evaluating different systems in this way, we find that semantic similarity modeling by neural systems can be boosted with an extended attention mask that quenches noise in the inputs.

References used

https://aclanthology.org/

rate research

Compositional Generalization for Neural Semantic Parsing via Span-level Supervised Attention

1077 - Association for Computation Linguistics 2021 مقالة

We describe a span-level supervised attention loss that improves compositional generalization in semantic parsers. Our approach builds on existing losses that encourage attention maps in neural sequence-to-sequence models to imitate the output of cla ssical word alignment algorithms. Where past work has used word-level alignments, we focus on spans; borrowing ideas from phrase-based machine translation, we align subtrees in semantic parses to spans of input sentences, and encourage neural attention mechanisms to mimic these alignments. This method improves the performance of transformers, RNNs, and structured decoders on three benchmarks of compositional generalization.

span-level supervised attention neural semantic parsing span-level supervised إشراف على مستوى الإشراف تحليل الدلالي العصبي تم الإشراف على مستوى صناعة حمض الفوسفور المزيد..

690 - Association for Computation Linguistics 2021 مقالة

Multiple-choice questions (MCQs) are widely used in knowledge assessment in educational institutions, during work interviews, in entertainment quizzes and games. Although the research on the automatic or semi-automatic generation of multiple-choice t est items has been conducted since the beginning of this millennium, most approaches focus on generating questions from a single sentence. In this research, a state-of-the-art method of creating questions based on multiple sentences is introduced. It was inspired by semantic similarity matches used in the translation memory component of translation management systems. The performance of two deep learning algorithms, doc2vec and SBERT, is compared for the paragraph similarity task. The experiments are performed on the ad-hoc corpus within the EU domain. For the automatic evaluation, a smaller corpus of manually selected matching paragraphs has been compiled. The results prove the good performance of Sentence Embeddings for the given task.

multiple-choice test items generating multiple-choice test multiple-choice test عناصر اختبار متعددة الخيارات توليد اختبار متعدد الخيارات تىسىؤابىؤاللارتبؤتي صناعة حمض الفوسفور المزيد..

716 - Association for Computation Linguistics 2021 مقالة

ROUGE is a widely used evaluation metric in text summarization. However, it is not suitable for the evaluation of abstractive summarization systems as it relies on lexical overlap between the gold standard and the generated summaries. This limitation becomes more apparent for agglutinative languages with very large vocabularies and high type/token ratios. In this paper, we present semantic similarity models for Turkish and apply them as evaluation metrics for an abstractive summarization task. To achieve this, we translated the English STSb dataset into Turkish and presented the first semantic textual similarity dataset for Turkish as well. We showed that our best similarity models have better alignment with average human judgments compared to ROUGE in both Pearson and Spearman correlations.

similarity based evaluation semantic similarity based based evaluation التقييم القائم على التشابه التشابه الدلالي مقرها تقييم مقرها صناعة حمض الفوسفور المزيد..

Paragraph-level Simplification of Medical Texts

821 - Association for Computation Linguistics 2021 مقالة

We consider the problem of learning to simplify medical texts. This is important because most reliable, up-to-date information in biomedicine is dense with jargon and thus practically inaccessible to the lay audience. Furthermore, manual simplificati on does not scale to the rapidly growing body of biomedical literature, motivating the need for automated approaches. Unfortunately, there are no large-scale resources available for this task. In this work we introduce a new corpus of parallel texts in English comprising technical and lay summaries of all published evidence pertaining to different clinical topics. We then propose a new metric based on likelihood scores from a masked language model pretrained on scientific texts. We show that this automated measure better differentiates between technical and lay summaries than existing heuristics. We introduce and evaluate baseline encoder-decoder Transformer models for simplification and propose a novel augmentation to these in which we explicitly penalize the decoder for producing jargon'' terms; we find that this yields improvements over baselines in terms of readability.

medical texts simplify medical texts paragraph-level simplification النصوص الطبية تبسيط النصوص الطبية تبسيط مستوى الفقرة صناعة حمض الفوسفور المزيد..

Evaluation Datasets for Cross-lingual Semantic Textual Similarity

825 - Association for Computation Linguistics 2021 مقالة

Semantic textual similarity (STS) systems estimate the degree of the meaning similarity between two sentences. Cross-lingual STS systems estimate the degree of the meaning similarity between two sentences, each in a different language. State-of-the-a rt algorithms usually employ a strongly supervised, resource-rich approach difficult to use for poorly-resourced languages. However, any approach needs to have evaluation data to confirm the results. In order to simplify the evaluation process for poorly-resourced languages (in terms of STS evaluation datasets), we present new datasets for cross-lingual and monolingual STS for languages without this evaluation data. We also present the results of several state-of-the-art methods on these data which can be used as a baseline for further research. We believe that this article will not only extend the current STS research to other languages, but will also encourage competition on this new evaluation data.

semantic textual similarity cross-lingual semantic textual semantic textual التشابه الدلالي النصي النص الدلالي عبر اللغات نص الدلالي صناعة حمض الفوسفور المزيد..

Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity

البحث عن مستندات قانونية على مستوى الفقرة: أتمتة توليد التسمية واستخدام قناع الاهتمام الموسع لتعزيز النماذج العصبية من التشابه الدلالي

Ask ChatGPT about the research

Read More

suggested questions