Semi-Automated Labeling of Requirement Datasets for Relation Extraction

69 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jannik Fischbach

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jeremias Bohn - Jannik Fischbach - Martin Schmitt

هندسة البرمجيات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Creating datasets manually by human annotators is a laborious task that can lead to biased and inhomogeneous labels. We propose a flexible, semi-automatic framework for labeling data for relation extraction. Furthermore, we provide a dataset of preprocessed sentences from the requirements engineering domain, including a set of automatically created as well as hand-crafted labels. In our case study, we compare the human and automatic labels and show that there is a substantial overlap between both annotations.

قيم البحث

55 - Etienne Andre , Tian Huat Tan , Manman Chen 2020

Service composition aims at achieving a business goal by composing existing service-based applications or components. The response time of a service is crucial especially in time critical business environments, which is often stated as a clause in se rvice level agreements between service providers and service users. To meet the guaranteed response time requirement of a composite service, it is important to select a feasible set of component services such that their response time will collectively satisfy the response time requirement of the composite service. In this work, we use the BPEL modeling language, that aims at specifying Web services. We extend it with timing parameters, and equip it with a formal semantics. Then, we propose a fully automated approach to synthesize the response time requirement of component services modeled using BPEL, in the form of a constraint on the local response times. The synthesized requirement will guarantee the satisfaction of the global response time requirement, statically or dynamically. We implemented our work into a tool, Selamat, and performed several experiments to evaluate the validity of our approach.

هندسة البرمجيات

Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics

422 - Shilin He , Jieming Zhu , Pinjia He 2020

Logs have been widely adopted in software system development and maintenance because of the rich system runtime information they contain. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. T o handle these large volumes of logs efficiently and effectively, a line of research focuses on intelligent log analytics powered by AI (artificial intelligence) techniques. However, only a small fraction of these techniques have reached successful deployment in industry because of the lack of public log datasets and necessary benchmarking upon them. To fill this significant gap between academia and industry and also facilitate more research on AI-powered log analytics, we have collected and organized loghub, a large collection of log datasets. In particular, loghub provides 17 real-world log datasets collected from a wide range of systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical log usage scenarios, and present a case study on anomaly detection to demonstrate how loghub facilitates the research and practice in this field. Up to the time of this paper writing, loghub datasets have been downloaded over 15,000 times by more than 380 organizations from both industry and academia.

هندسة البرمجيات

A Review on Semi-Supervised Relation Extraction

149 - Yusen Lin 2021

Relation extraction (RE) plays an important role in extracting knowledge from unstructured text but requires a large amount of labeled corpus. To reduce the expensive annotation efforts, semisupervised learning aims to leverage both labeled and unlab eled data. In this paper, we review and compare three typical methods in semi-supervised RE with deep learning or meta-learning: self-ensembling, which forces consistent under perturbations but may confront insufficient supervision; self-training, which iteratively generates pseudo labels and retrain itself with the enlarged labeled set; dual learning, which leverages a primal task and a dual task to give mutual feedback. Mean-teacher (Tarvainen and Valpola, 2017), LST (Li et al., 2019), and DualRE (Lin et al., 2019) are elaborated as the representatives to alleviate the weakness of these three methods, respectively.

الحساب واللغة الذكاء الاصطناعي استرجاع المعلومات

Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

98 - Paola Cascante-Bonilla , Fuwen Tan , Yanjun Qi 2020

In this paper we revisit the idea of pseudo-labeling in the context of semi-supervised learning where a learning algorithm has access to a small set of labeled samples and a large set of unlabeled samples. Pseudo-labeling works by applying pseudo-lab els to samples in the unlabeled set by using a model trained on the combination of the labeled samples and any previously pseudo-labeled samples, and iteratively repeating this process in a self-training cycle. Current methods seem to have abandoned this approach in favor of consistency regularization methods that train models under a combination of different styles of self-supervised losses on the unlabeled samples and standard supervised losses on the labeled samples. We empirically demonstrate that pseudo-labeling can in fact be competitive with the state-of-the-art, while being more resilient to out-of-distribution samples in the unlabeled set. We identify two key factors that allow pseudo-labeling to achieve such remarkable results (1) applying curriculum learning principles and (2) avoiding concept drift by restarting model parameters before each self-training cycle. We obtain 94.91% accuracy on CIFAR-10 using only 4,000 labeled samples, and 68.87% top-1 accuracy on Imagenet-ILSVRC using only 10% of the labeled samples. The code is available at https://github.com/uvavision/Curriculum-Labeling

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Semi-supervised Relation Extraction via Incremental Meta Self-Training

179 - Xuming Hu , Chenwei Zhang , Fukun Ma 2020

To alleviate human efforts from obtaining large-scale annotations, Semi-Supervised Relation Extraction methods aim to leverage unlabeled data in addition to learning from limited samples. Existing self-training methods suffer from the gradual drift p roblem, where noisy pseudo labels on unlabeled data are incorporated during training. To alleviate the noise in pseudo labels, we propose a method called MetaSRE, where a Relation Label Generation Network generates quality assessment on pseudo labels by (meta) learning from the successful and failed attempts on Relation Classification Network as an additional meta-objective. To reduce the influence of noisy pseudo labels, MetaSRE adopts a pseudo label selection and exploitation scheme which assesses pseudo label quality on unlabeled samples and only exploits high-quality pseudo labels in a self-training fashion to incrementally augment labeled samples for both robustness and accuracy. Experimental results on two public datasets demonstrate the effectiveness of the proposed approach.

الحساب واللغة التعلم الآلي

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حلب

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Semi-Automated Labeling of Requirement Datasets for Relation Extraction

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً