No Arabic abstract
This paper proposes to develop a network phenotyping mechanism based on network resource usage analysis and identify abnormal network traffic. The network phenotyping may use different metrics in the cyber physical system (CPS), including resource and network usage monitoring, physical state estimation. The set of devices will collectively decide a holistic view of the entire system through advanced image processing and machine learning methods. In this paper, we choose the network traffic pattern as a study case to demonstrate the effectiveness of the proposed method, while the methodology may similarly apply to classification and anomaly detection based on other resource metrics. We apply image processing and machine learning on the network resource usage to extract and recognize communication patterns. The phenotype method is experimented on four real-world decentralized applications. With proper length of sampled continuous network resource usage, the overall recognition accuracy is about 99%. Additionally, the recognition error is used to detect the anomaly network traffic. We simulate the anomaly network resource usage that equals to 10%, 20% and 30% of the normal network resource usage. The experiment results show the proposed anomaly detection method is efficient in detecting each intensity of anomaly network resource usage.
Monitoring network traffic to identify content, services, and applications is an active research topic in network traffic control systems. While modern firewalls provide the capability to decrypt packets, this is not appealing for privacy advocates. Hence, identifying any information from encrypted traffic is a challenging task. Nonetheless, previous work has identified machine learning methods that may enable application and service identification. The process involves high level feature extraction from network packet data then training a robust machine learning classifier for traffic identification. We propose a classification technique using an ensemble of deep learning architectures on packet, payload, and inter-arrival time sequences. To our knowledge, this is the first time such deep learning architectures have been applied to the Server Name Indication (SNI) classification problem. Our ensemble model beats the state of the art machine learning methods and our up-to-date model can be found on github: url{https://github.com/niloofarbayat/NetworkClassification}
We present a method to detect anomalies in a time series of flow interaction patterns. There are many existing methods for anomaly detection in network traffic, such as number of packets. However, there is non established method detecting anomalies in a time series of flow interaction patterns that can be represented as complex network. Firstly, based on proposed multivariate flow similarity method on temporal locality, a complex network model (MFS-TL) is constructed to describe the interactive behaviors of traffic flows. Having analyzed the relationships between MFS-TL characteristics, temporal locality window and multivariate flow similarity critical threshold, an approach for parameter determination is established. Having observed the evolution of MFS-TL characteristics, three non-deterministic correlations are defined for network states (i.e. normal or abnormal). Furthermore, intuitionistic fuzzy set (IFS) is introduced to quantify three non-deterministic correlations, and then a anomaly detection method is put forward for single characteristic sequence. To build an objective IFS, we design a Gaussian distribution-based membership function with a variable hesitation degree. To determine the mapping of IFSs clustering intervals to network states, a distinction index is developed. Then, an IFS ensemble method (IFSE-AD) is proposed to eliminate the impacts of the inconsistent about MFS-TL characteristic to network state and improve detection performance. Finally, we carried out extensive experiments on several network traffic datasets for anomaly detection, and the results demonstrate the superiority of IFSE-AD to state-of-the-art approaches, validating the effectiveness of our method.
Network traffic classification, a task to classify network traffic and identify its type, is the most fundamental step to improve network services and manage modern networks. Classical machine learning and deep learning method have developed well in the field of network traffic classification. However, there are still two major challenges. One is how to protect the privacy of users traffic data, and the other is that it is difficult to obtain labeled data in reality. In this paper, we propose a novel approach using federated semi-supervised learning for network traffic classification. In our approach, the federated servers and several clients work together to train a global classification model. Among them, unlabeled data is used on the client, and labeled data is used on the server. Moreover, we use two traffic subflow sampling methods: simple sampling and incremental sampling for data preprocessing. The experimental results in the QUIC dataset show that the accuracy of our federated semi-supervised approach can reach 91.08% and 97.81% when using the simple sampling method and incremental sampling method respectively. The experimental results also show that the accuracy gap between our method and the centralized training method is minimal, and it can effectively protect users privacy and does not require a large amount of labeled data.
The problem of detecting anomalies in time series from network measurements has been widely studied and is a topic of fundamental importance. Many anomaly detection methods are based on packet inspection collected at the network core routers, with consequent disadvantages in terms of computational cost and privacy. We propose an alternative method in which packet header inspection is not needed. The method is based on the extraction of a normal subspace obtained by the tensor decomposition technique considering the correlation between different metrics. We propose a new approach for online tensor decomposition where changes in the normal subspace can be tracked efficiently. Another advantage of our proposal is the interpretability of the obtained models. The flexibility of the method is illustrated by applying it to two distinct examples, both using actual data collected on residential routers.
This paper addresses network anomography, that is, the problem of inferring network-level anomalies from indirect link measurements. This problem is cast as a low-rank subspace tracking problem for normal flows under incomplete observations, and an outlier detection problem for abnormal flows. Since traffic data is large-scale time-structured data accompanied with noise and outliers under partial observations, an efficient modeling method is essential. To this end, this paper proposes an online subspace tracking of a Hankelized time-structured traffic tensor for normal flows based on the Candecomp/PARAFAC decomposition exploiting the recursive least squares (RLS) algorithm. We estimate abnormal flows as outlier sparse flows via sparsity maximization in the underlying under-constrained linear-inverse problem. A major advantage is that our algorithm estimates normal flows by low-dimensional matrices with time-directional features as well as the spatial correlation of multiple links without using the past observed measurements and the past model parameters. Extensive numerical evaluations show that the proposed algorithm achieves faster convergence per iteration of model approximation, and better volume anomaly detection performance compared to state-of-the-art algorithms.