أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Michael D. Iannacone

Automatic Labeling for Entity Extraction in Cyber Security

79 - Robert A. Bridges , Corinne L. Jones , Michael D. Iannacone 2013

Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally una vailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus ($sim$750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.

استرجاع المعلومات الحساب واللغة

PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts

130 - Nikki McNeil , Robert A. Bridges , Michael D. Iannacone 2013

Public disclosure of important security information, such as knowledge of vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and other online sources months before proper classification into structured databases. In order to f acilitate timely discovery of such knowledge, we propose a novel semi-supervised learning algorithm, PACE, for identifying and classifying relevant entities in text sources. The main contribution of this paper is an enhancement of the traditional bootstrapping method for entity extraction by employing a time-memory trade-off that simultaneously circumvents a costly corpus search while strengthening pattern nomination, which should increase accuracy. An implementation in the cyber-security domain is discussed as well as challenges to Natural Language Processing imposed by the security domain.

استرجاع المعلومات الحساب واللغة

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد