ترغب بنشر مسار تعليمي؟ اضغط هنا

Towards a relation extraction framework for cyber-security concepts

144   0   0.0 ( 0 )
 نشر من قبل Robert Bridges
 تاريخ النشر 2015
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised Natural Language Processing and implement a bootstrapping algorithm for extracting security entities and their relationships from text. The algorithm requires little input data, specifically, a few relations or patterns (heuristics for identifying relations), and incorporates an active learning component which queries the user on the most important decisions to prevent drifting from the desired relations. Preliminary testing on a small corpus shows promising results, obtaining precision of .82.

قيم البحث

اقرأ أيضاً

Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally una vailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus ($sim$750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.
Public disclosure of important security information, such as knowledge of vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and other online sources months before proper classification into structured databases. In order to f acilitate timely discovery of such knowledge, we propose a novel semi-supervised learning algorithm, PACE, for identifying and classifying relevant entities in text sources. The main contribution of this paper is an enhancement of the traditional bootstrapping method for entity extraction by employing a time-memory trade-off that simultaneously circumvents a costly corpus search while strengthening pattern nomination, which should increase accuracy. An implementation in the cyber-security domain is discussed as well as challenges to Natural Language Processing imposed by the security domain.
Identification of new concepts in scientific literature can help power faceted search, scientific trend analysis, knowledge-base construction, and more, but current methods are lacking. Manual identification cannot keep up with the torrent of new pub lications, while the precision of existing automatic techniques is too low for many applications. We present an unsupervised concept extraction method for scientific literature that achieves much higher precision than previous work. Our approach relies on a simple but novel intuition: each scientific concept is likely to be introduced or popularized by a single paper that is disproportionately cited by subsequent papers mentioning the concept. From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%, compared to 86% for prior work, and a substantially better precision-yield trade-off across the top 15,000 extractions. To stimulate research in this area, we release our code and data (https://github.com/allenai/ForeCite).
A capsule is a group of neurons, whose activity vector represents the instantiation parameters of a specific type of entity. In this paper, we explore the capsule networks used for relation extraction in a multi-instance multi-label learning framewor k and propose a novel neural approach based on capsule networks with attention mechanisms. We evaluate our method with different benchmarks, and it is demonstrated that our method improves the precision of the predicted relations. Particularly, we show that capsule networks improve multiple entity pairs relation extraction.
The advancement of biomedical named entity recognition (BNER) and biomedical relation extraction (BRE) researches promotes the development of text mining in biological domains. As a cornerstone of BRE, robust BNER system is required to identify the m entioned NEs in plain texts for further relation extraction stage. However, the current BNER corpora, which play important roles in these tasks, paid less attention to achieve the criteria for BRE task. In this study, we present Revised JNLPBA corpus, the revision of JNLPBA corpus, to broaden the applicability of a NER corpus from BNER to BRE task. We preserve the original entity types including protein, DNA, RNA, cell line and cell type while all the abstracts in JNLPBA corpus are manually curated by domain experts again basis on the new annotation guideline focusing on the specific NEs instead of general terms. Simultaneously, several imperfection issues in JNLPBA are pointed out and made up in the new corpus. To compare the adaptability of different NER systems in Revised JNLPBA and JNLPBA corpora, the F1-measure was measured in three open sources NER systems including BANNER, Gimli and NERSuite. In the same circumstance, all the systems perform average 10% better in Revised JNLPBA than in JNLPBA. Moreover, the cross-validation test is carried out which we train the NER systems on JNLPBA/Revised JNLPBA corpora and access the performance in both protein-protein interaction extraction (PPIE) and biomedical event extraction (BEE) corpora to confirm that the newly refined Revised JNLPBA is a competent NER corpus in biomedical relation application. The revised JNLPBA corpus is freely available at iasl-btm.iis.sinica.edu.tw/BNER/Content/Revised_JNLPBA.zip.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا