New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Integration of Static and Dynamic Analysis for Malware Family Classification with Composite Neural Network

153 0 0.0 ( 0 )

Download Cite

Added by Sam Yen

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Yao Saint Yen - Zhe Wei Chen - Ying Ren Guo

Cryptography and Security Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Deep learning has been used in the research of malware analysis. Most classification methods use either static analysis features or dynamic analysis features for malware family classification, and rarely combine them as classification features and also no extra effort is spent integrating the two types of features. In this paper, we combine static and dynamic analysis features with deep neural networks for Windows malware classification. We develop several methods to generate static and dynamic analysis features to classify malware in different ways. Given these features, we conduct experiments with composite neural network, showing that the proposed approach performs best with an accuracy of 83.17% on a total of 80 malware families with 4519 malware samples. Additionally, we show that using integrated features for malware family classification outperforms using static features or dynamic features alone. We show how static and dynamic features complement each other for malware classification.

rate research

Dynamic Malware Analysis with Feature Engineering and Feature Learning

127 - Zhaoqi Zhang , Panpan Qi , Wei Wang 2019

Dynamic malware analysis executes the program in an isolated environment and monitors its run-time behaviour (e.g. system API calls) for malware detection. This technique has been proven to be effective against various code obfuscation techniques and newly released (zero-day) malware. However, existing works typically only consider the API name while ignoring the arguments, or require complex feature engineering operations and expert knowledge to process the arguments. In this paper, we propose a novel and low-cost feature extraction approach, and an effective deep neural network architecture for accurate and fast malware detection. Specifically, the feature representation approach utilizes a feature hashing trick to encode the API call arguments associated with the API name. The deep neural network architecture applies multiple Gated-CNNs (convolutional neural networks) to transform the extracted features of each API call. The outputs are further processed through bidirectional LSTM (long-short term memory networks) to learn the sequential correlation among API calls. Experiments show that our solution outperforms baselines significantly on a large real dataset. Valuable insights about feature engineering and architecture design are derived from the ablation study.

Cryptography and Security Machine Learning

Android Malware Family Classification Based on Resource Consumption over Time

97 - Luca Massarelli , Leonardo Aniello , Claudio Ciccotelli 2017

The vast majority of todays mobile malware targets Android devices. This has pushed the research effort in Android malware analysis in the last years. An important task of malware analysis is the classification of malware samples into known families. Static malware analysis is known to fall short against techniques that change static characteristics of the malware (e.g. code obfuscation), while dynamic analysis has proven effective against such techniques. To the best of our knowledge, the most notable work on Android malware family classification purely based on dynamic analysis is DroidScribe. With respect to DroidScribe, our approach is easier to reproduce. Our methodology only employs publicly available tools, does not require any modification to the emulated environment or Android OS, and can collect data from physical devices. The latter is a key factor, since modern mobile malware can detect the emulated environment and hide their malicious behavior. Our approach relies on resource consumption metrics available from the proc file system. Features are extracted through detrended fluctuation analysis and correlation. Finally, a SVM is employed to classify malware into families. We provide an experimental evaluation on malware samples from the Drebin dataset, where we obtain a classification accuracy of 82%, proving that our methodology achieves an accuracy comparable to that of DroidScribe. Furthermore, we make the software we developed publicly available, to ease the reproducibility of our results.

Cryptography and Security

Malware-on-the-Brain: Illuminating Malware Byte Codes with Images for Malware Classification

176 - Fangtian Zhong , Zekai Chen , Minghui Xu 2021

Malware is a piece of software that was written with the intent of doing harm to data, devices, or people. Since a number of new malware variants can be generated by reusing codes, malware attacks can be easily launched and thus become common in recent years, incurring huge losses in businesses, governments, financial institutes, health providers, etc. To defeat these attacks, malware classification is employed, which plays an essential role in anti-virus products. However, existing works that employ either static analysis or dynamic analysis have major weaknesses in complicated reverse engineering and time-consuming tasks. In this paper, we propose a visualized malware classification framework called VisMal, which provides highly efficient categorization with acceptable accuracy. VisMal converts malware samples into images and then applies a contrast-limited adaptive histogram equalization algorithm to enhance the similarity between malware image regions in the same family. We provided a proof-of-concept implementation and carried out an extensive evaluation to verify the performance of our framework. The evaluation results indicate that VisMal can classify a malware sample within 5.2ms and have an average accuracy of 96.0%. Moreover, VisMal provides security engineers with a simple visualization approach to further validate its performance.

Cryptography and Security

Neurlux: Dynamic Malware Analysis Without Feature Engineering

137 - Chani Jindal , Christopher Salls , Hojjat Aghakhani 2019

Malware detection plays a vital role in computer security. Modern machine learning approaches have been centered around domain knowledge for extracting malicious features. However, many potential features can be used, and it is time consuming and difficult to manually identify the best features, especially given the diverse nature of malware. In this paper, we propose Neurlux, a neural network for malware detection. Neurlux does not rely on any feature engineering, rather it learns automatically from dynamic analysis reports that detail behavioral information. Our model borrows ideas from the field of document classification, using word sequences present in the reports to predict if a report is from a malicious binary or not. We investigate the learned features of our model and show which components of the reports it tends to give the highest importance. Then, we evaluate our approach on two different datasets and report formats, showing that Neurlux improves on the state of the art and can effectively learn from the dynamic analysis reports. Furthermore, we show that our approach is portable to other malware analysis environments and generalizes to different datasets.

Cryptography and Security

Reviewer Integration and Performance Measurement for Malware Detection

93 - Brad Miller , Alex Kantchelian , Michael Carl Tschantz 2015

We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the systems ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 million binaries with 778GB of raw feature data. Without reviewer assistance, we achieve 72% detection at a 0.5% false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89% and are able to detect 42% of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of training labels obtained at the same time training data is first seen with training labels obtained months later. We find that using training labels obtained well after samples appear, and thus unavailable in practice for current training data, inflates measured detection by almost 20 percentage points. We release our cluster-based implementation, as well as a list of all hashes in our evaluation and 3% of our entire dataset.

Cryptography and Security

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Integration of Static and Dynamic Analysis for Malware Family Classification with Composite Neural Network

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions