Automatic Yara Rule Generation Using Biclustering

59 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Edward Raff

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Edward Raff - Richard Zak - Gary Lopez Munoz

التشفير والأمن استرجاع المعلومات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Yara rules are a ubiquitous tool among cybersecurity practitioners and analysts. Developing high-quality Yara rules to detect a malware family of interest can be labor- and time-intensive, even for expert users. Few tools exist and relatively little work has been done on how to automate the generation of Yara rules for specific families. In this paper, we leverage large n-grams ($n geq 8$) combined with a new biclustering algorithm to construct simple Yara rules more effectively than currently available software. Our method, AutoYara, is fast, allowing for deployment on low-resource equipment for teams that deploy to remote networks. Our results demonstrate that AutoYara can help reduce analyst workload by producing rules with useful true-positive rates while maintaining low false-positive rates, sometimes matching or even outperforming human analysts. In addition, real-world testing by malware analysts indicates AutoYara could reduce analyst time spent constructing Yara rules by 44-86%, allowing them to spend their time on the more advanced malware that current tools cant handle. Code will be made available at https://github.com/NeuromorphicComputationResearchProgram .

قيم البحث

120 - Wentao Ding , Jianhao Chen , Jinmao Li 2021

The understanding of time expressions includes two sub-tasks: recognition and normalization. In recent years, significant progress has been made in the recognition of time expressions while research on normalization has lagged behind. Existing SOTA n ormalization methods highly rely on rules or grammars designed by experts, which limits their performance on emerging corpora, such as social media texts. In this paper, we model time expression normalization as a sequence of operations to construct the normalized temporal value, and we present a novel method called ARTime, which can automatically generate normalization rules from training data without expert interventions. Specifically, ARTime automatically captures possible operation sequences from annotated data and generates normalization rules on time expressions with common surface forms. The experimental results show that ARTime can significantly surpass SOTA methods on the Tweets benchmark, and achieves competitive results with existing expert-engineered rule methods on the TempEval-3 benchmark.

الحساب واللغة

Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks

153 - Peter Li , Jiyuan Qian , Tian Wang 2015

Traditional methods to tackle many music information retrieval tasks typically follow a two-step architecture: feature engineering followed by a simple learning algorithm. In these shallow architectures, feature engineering and learning are typically disjoint and unrelated. Additionally, feature engineering is difficult, and typically depends on extensive domain expertise. In this paper, we present an application of convolutional neural networks for the task of automatic musical instrument identification. In this model, feature extraction and learning algorithms are trained together in an end-to-end fashion. We show that a convolutional neural network trained on raw audio can achieve performance surpassing traditional methods that rely on hand-crafted features.

أنظمة الصوت في الحاسوب استرجاع المعلومات التعلم الآلي

AAG-Stega: Automatic Audio Generation-based Steganography

67 - Zhongliang Yang , Xingjian Du , Yilin Tan 2018

Steganography, as one of the three basic information security systems, has long played an important role in safeguarding the privacy and confidentiality of data in cyberspace. Audio is one of the most common means of information transmission in our d aily life. Thus its of great practical significance to using audio as a carrier of information hiding. At present, almost all audio-based information hiding methods are based on carrier modification mode. However, this mode is equivalent to adding noise to the original signal, resulting in a difference in the statistical feature distribution of the carrier before and after steganography, which impairs the concealment of the entire system. In this paper, we propose an automatic audio generation-based steganography(AAG-Stega), which can automatically generate high-quality audio covers on the basis of the secret bits stream that needs to be embedded. In the automatic audio generation process, we reasonably encode the conditional probability distribution space of each sampling point and select the corresponding signal output according to the bitstream to realize the secret information embedding. We designed several experiments to test the proposed model from the perspectives of information imperceptibility and information hidden capacity. The experimental results show that the proposed model can guarantee high hidden capacity and concealment at the same time.

التشفير والأمن

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

57 - Akshay Smit , Saahil Jain , Pranav Rajpurkar 2020

The extraction of labels from radiology text reports enables large-scale training of medical imaging models. Existing approaches to report labeling typically rely either on sophisticated feature engineering based on medical domain knowledge or manual annotations by experts. In this work, we introduce a BERT-based approach to medical image report labeling that exploits both the scale of available rule-based systems and the quality of expert annotations. We demonstrate superior performance of a biomedically pretrained BERT model first trained on annotations of a rule-based labeler and then finetuned on a small set of expert annotations augmented with automated backtranslation. We find that our final model, CheXbert, is able to outperform the previous best rules-based labeler with statistical significance, setting a new SOTA for report labeling on one of the largest datasets of chest x-rays.

الحساب واللغة استرجاع المعلومات التعلم الآلي

Studying Ransomware Attacks Using Web Search Logs

109 - Chetan Bansal , Pantazis Deligiannis , Chandra Maddila 2020

Cyber attacks are increasingly becoming prevalent and causing significant damage to individuals, businesses and even countries. In particular, ransomware attacks have grown significantly over the last decade. We do the first study on mining insights about ransomware attacks by analyzing query logs from Bing web search engine. We first extract ransomware related queries and then build a machine learning model to identify queries where users are seeking support for ransomware attacks. We show that user search behavior and characteristics are correlated with ransomware attacks. We also analyse trends in the temporal and geographical space and validate our findings against publicly available information. Lastly, we do a case study on Nemty, a popular ransomware, to show that it is possible to derive accurate insights about cyber attacks by query log analysis.

التشفير والأمن استرجاع المعلومات