Malware-on-the-Brain: Illuminating Malware Byte Codes with Images for Malware Classification

177 0 0.0 ( 0 )

Download Cite

Added by Fangtian Zhong

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Fangtian Zhong - Zekai Chen - Minghui Xu

Cryptography and Security

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Malware is a piece of software that was written with the intent of doing harm to data, devices, or people. Since a number of new malware variants can be generated by reusing codes, malware attacks can be easily launched and thus become common in recent years, incurring huge losses in businesses, governments, financial institutes, health providers, etc. To defeat these attacks, malware classification is employed, which plays an essential role in anti-virus products. However, existing works that employ either static analysis or dynamic analysis have major weaknesses in complicated reverse engineering and time-consuming tasks. In this paper, we propose a visualized malware classification framework called VisMal, which provides highly efficient categorization with acceptable accuracy. VisMal converts malware samples into images and then applies a contrast-limited adaptive histogram equalization algorithm to enhance the similarity between malware image regions in the same family. We provided a proof-of-concept implementation and carried out an extensive evaluation to verify the performance of our framework. The evaluation results indicate that VisMal can classify a malware sample within 5.2ms and have an average accuracy of 96.0%. Moreover, VisMal provides security engineers with a simple visualization approach to further validate its performance.

rate research

MAB-Malware: A Reinforcement Learning Framework for Attacking Static Malware Classifiers

331 - Wei Song , Xuezixiang Li , Sadia Afroz 2020

Modern commercial antivirus systems increasingly rely on machine learning to keep up with the rampant inflation of new malware. However, it is well-known that machine learning models are vulnerable to adversarial examples (AEs). Previous works have shown that ML malware classifiers are fragile to the white-box adversarial attacks. However, ML models used in commercial antivirus products are usually not available to attackers and only return hard classification labels. Therefore, it is more practical to evaluate the robustness of ML models and real-world AVs in a pure black-box manner. We propose a black-box Reinforcement Learning (RL) based framework to generate AEs for PE malware classifiers and AV engines. It regards the adversarial attack problem as a multi-armed bandit problem, which finds an optimal balance between exploiting the successful patterns and exploring more varieties. Compared to other frameworks, our improvements lie in three points. 1) Limiting the exploration space by modeling the generation process as a stateless process to avoid combination explosions. 2) Due to the critical role of payload in AE generation, we design to reuse the successful payload in modeling. 3) Minimizing the changes on AE samples to correctly assign the rewards in RL learning. It also helps identify the root cause of evasions. As a result, our framework has much higher black-box evasion rates than other off-the-shelf frameworks. Results show it has over 74%--97% evasion rate for two state-of-the-art ML detectors and over 32%--48% evasion rate for commercial AVs in a pure black-box setting. We also demonstrate that the transferability of adversarial attacks among ML-based classifiers is higher than the attack transferability between purely ML-based and commercial AVs.

Cryptography and Security

Game Theoretic Malware Detection

132 - Revan MacQueen , Nicholas Bombardieri , James R. Wright 2020

Large software platforms (e.g., mobile app stores, social media, email service providers) must ensure that files on their platform do not contain malicious code. Platform hosts use security tools to analyze those files for potential malware. However, given the expensive runtimes of tools coupled with the large number of exchanged files, platforms are not able to run all tools on every incoming file. Moreover, malicious parties look to find gaps in the coverage of the analysis tools, and exchange files containing malware that exploits these vulnerabilities. To address this problem, we present a novel approach that models the relationship between malicious parties and the security analyst as a leader-follower Stackelberg security game. To estimate the parameters of our model, we have combined the information from the VirusTotal dataset with the more detailed reports from the National Vulnerability Database. Compared to a set of natural baselines, we show that our model computes an optimal randomization over sets of available security analysis tools.

Cryptography and Security Computer Science and Game Theory

Android Malware Family Classification Based on Resource Consumption over Time

97 - Luca Massarelli , Leonardo Aniello , Claudio Ciccotelli 2017

The vast majority of todays mobile malware targets Android devices. This has pushed the research effort in Android malware analysis in the last years. An important task of malware analysis is the classification of malware samples into known families. Static malware analysis is known to fall short against techniques that change static characteristics of the malware (e.g. code obfuscation), while dynamic analysis has proven effective against such techniques. To the best of our knowledge, the most notable work on Android malware family classification purely based on dynamic analysis is DroidScribe. With respect to DroidScribe, our approach is easier to reproduce. Our methodology only employs publicly available tools, does not require any modification to the emulated environment or Android OS, and can collect data from physical devices. The latter is a key factor, since modern mobile malware can detect the emulated environment and hide their malicious behavior. Our approach relies on resource consumption metrics available from the proc file system. Features are extracted through detrended fluctuation analysis and correlation. Finally, a SVM is employed to classify malware into families. We provide an experimental evaluation on malware samples from the Drebin dataset, where we obtain a classification accuracy of 82%, proving that our methodology achieves an accuracy comparable to that of DroidScribe. Furthermore, we make the software we developed publicly available, to ease the reproducibility of our results.

Cryptography and Security

On Training Robust PDF Malware Classifiers

86 - Yizheng Chen , Shiqi Wang , Dongdong She 2019

Although state-of-the-art PDF malware classifiers can be trained with almost perfect test accuracy (99%) and extremely low false positive rate (under 0.1%), it has been shown that even a simple adversary can evade them. A practically useful malware classifier must be robust against evasion attacks. However, achieving such robustness is an extremely challenging task. In this paper, we take the first steps towards training robust PDF malware classifiers with verifiable robustness properties. For instance, a robustness property can enforce that no matter how many pages from benign documents are inserted into a PDF malware, the classifier must still classify it as malicious. We demonstrate how the worst-case behavior of a malware classifier with respect to specific robustness properties can be formally verified. Furthermore, we find that training classifiers that satisfy formally verified robustness properties can increase the evasion cost of unbounded (i.e., not bounded by the robustness properties) attackers by eliminating simple evasion attacks. Specifically, we propose a new distance metric that operates on the PDF tree structure and specify two classes of robustness properties including subtree insertions and deletions. We utilize state-of-the-art verifiably robust training method to build robust PDF malware classifiers. Our results show that, we can achieve 92.27% average verified robust accuracy over three properties, while maintaining 99.74% accuracy and 0.56% false positive rate. With simple robustness properties, our robust model maintains 7% higher robust accuracy than all the baseline models against unrestricted whitebox attacks. Moreover, the state-of-the-art and new adaptive evolutionary attackers need up to 10 times larger $L_0$ feature distance and 21 times more PDF basic mutations (e.g., inserting and deleting objects) to evade our robust model than the baselines.

Cryptography and Security Machine Learning

Malware Evasion Attack and Defense

151 - Yonghong Huang , Utkarsh Verma , Celeste Fralick 2019

Machine learning (ML) classifiers are vulnerable to adversarial examples. An adversarial example is an input sample which is slightly modified to induce misclassification in an ML classifier. In this work, we investigate white-box and grey-box evasion attacks to an ML-based malware detector and conduct performance evaluations in a real-world setting. We compare the defense approaches in mitigating the attacks. We propose a framework for deploying grey-box and black-box attacks to malware detection systems.

Cryptography and Security Machine Learning Machine Learning