ترغب بنشر مسار تعليمي؟ اضغط هنا

Building an Effective Intrusion Detection System using Unsupervised Feature Selection in Multi-objective Optimization Framework

67   0   0.0 ( 0 )
 نشر من قبل Chanchal Suman
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Intrusion Detection Systems (IDS) are developed to protect the network by detecting the attack. The current paper proposes an unsupervised feature selection technique for analyzing the network data. The search capability of the non-dominated sorting genetic algorithm (NSGA-II) has been employed for optimizing three different objective functions utilizing different information theoretic measures including mutual information, standard deviation, and information gain to identify mutually exclusive and a high variant subset of features. Finally, the Pareto optimal front of the different optimal feature subsets are obtained and these feature subsets are utilized for developing classification systems using different popular machine learning models like support vector machines, decision trees and k-nearest neighbour (k=5) classifier etc. We have evaluated the results of the algorithm on KDD-99, NSL-KDD and Kyoto 2006+ datasets. The experimental results on KDD-99 dataset show that decision tree provides better results than other available classifiers. The proposed system obtains the best results of 99.78% accuracy, 99.27% detection rate and false alarm rate of 0.2%, which are better than all the previous results for KDD dataset. We achieved an accuracy of 99.83% for 20% testing data of NSL-KDD dataset and 99.65% accuracy for 10-fold cross-validation on Kyoto dataset. The most attractive characteristic of the proposed scheme is that during the selection of appropriate feature subset, no labeled information is utilized and different feature quality measures are optimized simultaneously using the multi-objective optimization framework.



قيم البحث

اقرأ أيضاً

Mobile ad hoc networking (MANET) has become an exciting and important technology in recent years because of the rapid proliferation of wireless devices. MANETs are highly vulnerable to attacks due to the open medium, dynamically changing network topo logy and lack of centralized monitoring point. It is important to search new architecture and mechanisms to protect the wireless networks and mobile computing application. IDS analyze the network activities by means of audit data and use patterns of well-known attacks or normal profile to detect potential attacks. There are two methods to analyze: misuse detection and anomaly detection. Misuse detection is not effective against unknown attacks and therefore, anomaly detection method is used. In this approach, the audit data is collected from each mobile node after simulating the attack and compared with the normal behavior of the system. If there is any deviation from normal behavior then the event is considered as an attack. Some of the features of collected audit data may be redundant or contribute little to the detection process. So it is essential to select the important features to increase the detection rate. This paper focuses on implementing two feature selection methods namely, markov blanket discovery and genetic algorithm. In genetic algorithm, bayesian network is constructed over the collected features and fitness function is calculated. Based on the fitness value the features are selected. Markov blanket discovery also uses bayesian network and the features are selected depending on the minimum description length. During the evaluation phase, the performances of both approaches are compared based on detection rate and false alarm rate.
143 - Weiyu Chen , Hisao Ishibuchi , 2021
Subset selection is an interesting and important topic in the field of evolutionary multi-objective optimization (EMO). Especially, in an EMO algorithm with an unbounded external archive, subset selection is an essential post-processing procedure to select a pre-specified number of solutions as the final result. In this paper, we discuss the efficiency of greedy subset selection for the hypervolume, IGD and IGD+ indicators. Greedy algorithms usually efficiently handle subset selection. However, when a large number of solutions are given (e.g., subset selection from tens of thousands of solutions in an unbounded external archive), they often become time-consuming. Our idea is to use the submodular property, which is known for the hypervolume indicator, to improve their efficiency. First, we prove that the IGD and IGD+ indicators are also submodular. Next, based on the submodular property, we propose an efficient greedy inclusion algorithm for each indicator. Then, we demonstrate through computational experiments that the proposed algorithms are much faster than the standard greedy subset selection algorithms.
107 - Weizhen Hu , Min Jiang , Xing Gao 2019
The main feature of the Dynamic Multi-objective Optimization Problems (DMOPs) is that optimization objective functions will change with times or environments. One of the promising approaches for solving the DMOPs is reusing the obtained Pareto optima l set (POS) to train prediction models via machine learning approaches. In this paper, we train an Incremental Support Vector Machine (ISVM) classifier with the past POS, and then the solutions of the DMOP we want to solve at the next moment are filtered through the trained ISVM classifier. A high-quality initial population will be generated by the ISVM classifier, and a variety of different types of population-based dynamic multi-objective optimization algorithms can benefit from the population. To verify this idea, we incorporate the proposed approach into three evolutionary algorithms, the multi-objective particle swarm optimization(MOPSO), Nondominated Sorting Genetic Algorithm II (NSGA-II), and the Regularity Model-based multi-objective estimation of distribution algorithm(RE-MEDA). We employ experiments to test these algorithms, and experimental results show the effectiveness.
Application of the multi-objective particle swarm optimisation (MOPSO) algorithm to design of water distribution systems is described. An earlier MOPSO algorithm is augmented with (a) local search, (b) a modified strategy for assigning the leader, an d (c) a modified mutation scheme. For one of the benchmark problems described in the literature, the effect of each of the above features on the algorithm performance is demonstrated. The augmented MOPSO algorithm (called MOPSO+) is applied to five benchmark problems, and in each case, it finds non-dominated solutions not reported earlier. In addition, for the purpose of comparing Pareto fronts (sets of non-dominated solutions) obtained by different algorithms, a new criterion is suggested, and its usefulness is pointed out with an example. Finally, some suggestions regarding future research directions are made.
Both feature selection and hyperparameter tuning are key tasks in machine learning. Hyperparameter tuning is often useful to increase model performance, while feature selection is undertaken to attain sparse models. Sparsity may yield better model in terpretability and lower cost of data acquisition, data handling and model inference. While sparsity may have a beneficial or detrimental effect on predictive performance, a small drop in performance may be acceptable in return for a substantial gain in sparseness. We therefore treat feature selection as a multi-objective optimization task. We perform hyperparameter tuning and feature selection simultaneously because the choice of features of a model may influence what hyperparameters perform well. We present, benchmark, and compare two different approaches for multi-objective joint hyperparameter optimization and feature selection: The first uses multi-objective model-based optimization. The second is an evolutionary NSGA-II-based wrapper approach to feature selection which incorporates specialized sampling, mutation and recombination operators. Both methods make use of parameterized filter ensembles. While model-based optimization needs fewer objective evaluations to achieve good performance, it incurs computational overhead compared to the NSGA-II, so the preferred choice depends on the cost of evaluating a model on given data.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا