Unsupervised Detection and Clustering of Malicious TLS Flows

79 0 0.0 ( 0 )

Download Cite

Added by Gibran Montes

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Gibran Gomez - Platon Kotzias - Matteo DellAmico

Cryptography and Security

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Malware abuses TLS to encrypt its malicious traffic, preventing examination by content signatures and deep packet inspection. Network detection of malicious TLS flows is an important, but challenging, problem. Prior works have proposed supervised machine learning detectors using TLS features. However, by trying to represent all malicious traffic, supervised binary detectors produce models that are too loose, thus introducing errors. Furthermore, they do not distinguish flows generated by different malware. On the other hand, supervised multi-class detectors produce tighter models and can classify flows by malware family, but require family labels, which are not available for many samples. To address these limitations, this work proposes a novel unsupervised approach to detect and cluster malicious TLS flows. Our approach takes as input network traces from sandboxes. It clusters similar TLS flows using 90 features that capture properties of the TLS client, TLS server, certificate, and encrypted payload; and uses the clusters to build an unsupervised detector that can assign a malicious flow to the cluster it belongs to, or determine it is benign. We evaluate our approach using 972K traces from a commercial sandbox and 35M TLS flows from a research network. Our unsupervised detector achieves a F1 score of 0.91, compared to 0.82 for the state-of-the-art supervised detector. The false detection rate of our detector is 0.032% measured over four months of traffic.

rate research

Vulnerability and Transaction behavior based detection of Malicious Smart Contracts

108 - Rachit Agarwal , Tanmay Thapliyal , Sandeep Kumar Shukla 2021

Smart Contracts (SCs) in Ethereum can automate tasks and provide different functionalities to a user. Such automation is enabled by the `Turing-complete nature of the programming language (Solidity) in which SCs are written. This also opens up different vulnerabilities and bugs in SCs that malicious actors exploit to carry out malicious or illegal activities on the cryptocurrency platform. In this work, we study the correlation between malicious activities and the vulnerabilities present in SCs and find that some malicious activities are correlated with certain types of vulnerabilities. We then develop and study the feasibility of a scoring mechanism that corresponds to the severity of the vulnerabilities present in SCs to determine if it is a relevant feature to identify suspicious SCs. We analyze the utility of severity score towards detection of suspicious SCs using unsupervised machine learning (ML) algorithms across different temporal granularities and identify behavioral changes. In our experiments with on-chain SCs, we were able to find a total of 1094 benign SCs across different granularities which behave similar to malicious SCs, with the inclusion of the smart contract vulnerability scores in the feature set.

Cryptography and Security Distributed Parallel and Cluster Computing Machine Learning

Droidetec: Android Malware Detection and Malicious Code Localization through Deep Learning

166 - Zhuo Ma , Haoran Ge , Zhuzhu Wang 2020

Android malware detection is a critical step towards building a security credible system. Especially, manual search for the potential malicious code has plagued program analysts for a long time. In this paper, we propose Droidetec, a deep learning based method for android malware detection and malicious code localization, to model an application program as a natural language sequence. Droidetec adopts a novel feature extraction method to derive behavior sequences from Android applications. Based on that, the bi-directional Long Short Term Memory network is utilized for malware detection. Each unit in the extracted behavior sequence is inventively represented as a vector, which allows Droidetec to automatically analyze the semantics of sequence segments and eventually find out the malicious code. Experiments with 9616 malicious and 11982 benign programs show that Droidetec reaches an accuracy of 97.22% and an F1-score of 98.21%. In all, Droidetec has a hit rate of 91% to properly find out malicious code segments.

Cryptography and Security

Early Detection of In-Memory Malicious Activity based on Run-time Environmental Features

132 - Dorel Yaffe , Danny Hendler 2021

In recent years malware has become increasingly sophisticated and difficult to detect prior to exploitation. While there are plenty of approaches to malware detection, they all have shortcomings when it comes to identifying malware correctly prior to exploitation. The trade-off is usually between false positives, causing overhead, preventing normal usage and the risk of letting the malware execute and cause damage to the target. We present a novel end-to-end solution for in-memory malicious activity detection done prior to exploitation by leveraging machine learning capabilities based on data from unique run-time logs, which are carefully curated in order to detect malicious activity in the memory of protected processes. This solution achieves reduced overhead and false positives as well as deployment simplicity. We implemented our solution for Windows-based systems, employing multi disciplinary knowledge from malware research, machine learning, and operating system internals. Our experimental evaluation yielded promising results. As we expect future sophisticated malware may try to bypass it, we also discuss how our solution can be extended to thwart such bypassing attempts.

Cryptography and Security Machine Learning

Killing Two Birds with One Stone: Malicious Domain Detection with High Accuracy and Coverage

63 - Issa Khalil , Bei Guan , Mohamed Nabeel 2017

Inference based techniques are one of the major approaches to analyze DNS data and detecting malicious domains. The key idea of inference techniques is to first define associations between domains based on features extracted from DNS data. Then, an inference algorithm is deployed to infer potential malicious domains based on their direct/indirect associations with known malicious ones. The way associations are defined is key to the effectiveness of an inference technique. It is desirable to be both accurate (i.e., avoid falsely associating domains with no meaningful connections) and with good coverage (i.e., identify all associations between domains with meaningful connections). Due to the limited scope of information provided by DNS data, it becomes a challenge to design an association scheme that achieves both high accuracy and good coverage. In this paper, we propose a new association scheme to identify domains controlled by the same entity. Our key idea is an in-depth analysis of active DNS data to accurately separate public IPs from dedicated ones, which enables us to build high-quality associations between domains. Our scheme identifies many meaningful connections between domains that are discarded by existing state-of-the-art approaches. Our experimental results show that the proposed association scheme not only significantly improves the domain coverage compared to existing approaches but also achieves better detection accuracy. Existing path-based inference algorithm is specifically designed for DNS data analysis. It is effective but computationally expensive. As a solution, we investigate the effectiveness of combining our association scheme with the generic belief propagation algorithm. Through comprehensive experiments, we show that this approach offers significant efficiency and scalability improvement with only minor negative impact of detection accuracy.

Cryptography and Security

Flow Size Difference Can Make a Difference: Detecting Malicious TCP Network Flows Based on Benfords Law

546 - Aamo Iorliam , Santosh Tirunagari , Anthony T.S. Ho 2016

Statistical characteristics of network traffic have attracted a significant amount of research for automated network intrusion detection, some of which looked at applications of natural statistical laws such as Zipfs law, Benfords law and the Pareto distribution. In this paper, we present the application of Benfords law to a new network flow metric flow size difference, which have not been studied before by other researchers, to build an unsupervised flow-based intrusion detection system (IDS). The method was inspired by our observation on a large number of TCP flow datasets where normal flows tend to follow Benfords law closely but malicious flows tend to deviate significantly from it. The proposed IDS is unsupervised, so it can be easily deployed without any training. It has two simple operational parameters with a clear semantic meaning, allowing the IDS operator to set and adapt their values intuitively to adjust the overall performance of the IDS. We tested the proposed IDS on two (one closed and one public) datasets, and proved its efficiency in terms of AUC (area under the ROC curve). Our work showed the flow size difference has a great potential to improve the performance of any flow-based network IDSs.

Cryptography and Security Artificial Intelligence Networking and Internet Architecture