دقة النفي النفي هي مفتاح استخراج المعلومات عالية الجودة من النصوص السريرية، ولكن حتى الآن، كانت الجهود المبذولة لجعل المشفرين المستخدمة في نفي استخراج المعلومات، كانت تقتصر على اللغة الإنجليزية.نقدم نهجا عالميا لاستحقاق النرجب متعدد اللغات اللغوي، الذي يتغلب على عدم وجود بيانات تدريبية من خلال الاعتماد على الموارد المتفاوتة بلغات ومجالات مختلفة.نقيم نهجين للتعلم من هذه الموارد، والتدريب على البيانات والتدريب المجمع في إعداد التعلم متعدد المهام.تظهر تجاربنا أن دقة النطاق الصفرية في النص السريري ممكن، وأن الجمع بين الموارد المتاحة تعمل على تحسين الأداء في معظم الحالات.
Negation scope resolution is key to high-quality information extraction from clinical texts, but so far, efforts to make encoders used for information extraction negation-aware have been limited to English. We present a universal approach to multilingual negation scope resolution, that overcomes the lack of training data by relying on disparate resources in different languages and domains. We evaluate two approaches to learn from these resources, training on combined data and training in a multi-task learning setup. Our experiments show that zero-shot scope resolution in clinical text is possible, and that combining available resources improves performance in most cases.
References used
https://aclanthology.org/
India is one of the richest language hubs on the earth and is very diverse and multilingual. But apart from a few Indian languages, most of them are still considered to be resource poor. Since most of the NLP techniques either require linguistic know
Transformer-based methods are appealing for multilingual text classification, but common research benchmarks like XNLI (Conneau et al., 2018) do not reflect the data availability and task variety of industry applications. We present an empirical comp
The recent Text-to-Text Transfer Transformer'' (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 th
Recent work on multilingual AMR-to-text generation has exclusively focused on data augmentation strategies that utilize silver AMR. However, this assumes a high quality of generated AMRs, potentially limiting the transferability to the target task. I
This paper introduces NorecNeg -- the first annotated dataset of negation for Norwegian. Negation cues and their in-sentence scopes have been annotated across more than 11K sentences spanning more than 400 documents for a subset of the Norwegian Revi