No Arabic abstract
In recent years, deep learning gained proliferating popularity in the cybersecurity application domain, since when being compared to traditional machine learning, it usually involves less human effort, produces better results, and provides better generalizability. However, the imbalanced data issue is very common in cybersecurity, which can substantially deteriorate the performance of the deep learning models. This paper introduces a transfer learning based method to tackle the imbalanced data issue in cybersecurity using Return-Oriented Programming (ROP) payload detection as a case study. We achieved 0.033 average false positive rate, 0.9718 average F1 score and 0.9418 average detection rate on 3 different target domain programs using 2 different source domain programs, with 0 benign training data samples in the target domain. The performance improvement compared to the baseline is a trade-off between false positive rate and detection rate. Using our approach, the number of false positives is reduced by 23.20%, and as a trade-off, the number of detected malicious samples is reduced by 0.50%.
Several machine learning techniques for accurate detection of skin cancer from medical images have been reported. Many of these techniques are based on pre-trained convolutional neural networks (CNNs), which enable training the models based on limited amounts of training data. However, the classification accuracy of these models still tends to be severely limited by the scarcity of representative images from malignant tumours. We propose a novel ensemble-based CNN architecture where multiple CNN models, some of which are pre-trained and some are trained only on the data at hand, along with auxiliary data in the form of metadata associated with the input images, are combined using a meta-learner. The proposed approach improves the models ability to handle limited and imbalanced data. We demonstrate the benefits of the proposed technique using a dataset with 33126 dermoscopic images from 2056 patients. We evaluate the performance of the proposed technique in terms of the F1-measure, area under the ROC curve (AUC-ROC), and area under the PR-curve (AUC-PR), and compare it with that of seven different benchmark methods, including two recent CNN-based techniques. The proposed technique compares favourably in terms of all the evaluation metrics.
Return-Oriented Programming (ROP) is a software exploit for system compromise. By chaining short instruction sequences from existing code pieces, ROP can bypass static code-integrity checking approaches and non-executable page protections. Existing defenses either require access to source code or binary, a customized compiler or hardware modifications, or suffer from high performance and storage overhead. In this work, we propose SIGDROP, a low-cost approach for ROP detection which uses low-level properties inherent to ROP attacks. Specifically, we observe special patterns of certain hardware events when a ROP attack occurs during program execution. Such hardware event-based patterns form signatures to flag ROP attacks at runtime. SIGDROP leverages Hardware Performance Counters, which are already present in commodity processors, to efficiently capture and extract the signatures. Our evaluation demonstrates that SIGDROP can effectively detect ROP attacks with acceptable performance overhead and negligible storage overhead.
Network intrusion is a well-studied area of cyber security. Current machine learning-based network intrusion detection systems (NIDSs) monitor network data and the patterns within those data but at the cost of presenting significant issues in terms of privacy violations which may threaten end-user privacy. Therefore, to mitigate risk and preserve a balance between security and privacy, it is imperative to protect user privacy with respect to intrusion data. Moreover, cost is a driver of a machine learning-based NIDS because such systems are increasingly being deployed on resource-limited edge devices. To solve these issues, in this paper we propose a NIDS called PCC-LSM-NIDS that is composed of a Pearson Correlation Coefficient (PCC) based feature selection algorithm and a Least Square Method (LSM) based privacy-preserving algorithm to achieve low-cost intrusion detection while providing privacy preservation for sensitive data. The proposed PCC-LSM-NIDS is tested on the benchmark intrusion database UNSW-NB15, using five popular classifiers. The experimental results show that the proposed PCC-LSM-NIDS offers advantages in terms of less computational time, while offering an appropriate degree of privacy protection.
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation. The exponential expansion in the deployment of cloud technology has produced a massive amount of data from a variety of applications, resources and platforms. In turn, the rapid rate and volume of data creation has begun to pose significant challenges for data management and security. The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance. In this paper, we conduct a systematic literature review (SLR) of data mining techniques (DMT) used in IDS-based solutions through the period 2013-2018. We employed criterion-based, purposive sampling identifying 32 articles, which constitute the primary source of the present survey. After a careful investigation of these articles, we identified 17 separate DMTs deployed in an IDS context. This paper also presents the merits and disadvantages of the various works of current research that implemented DMTs and distributed streaming frameworks (DSF) to detect and/or prevent malicious attacks in a big data environment.
WhatsApp is a popular messaging app used by over a billion users around the globe. Due to this popularity, spam on WhatsApp is an important issue. Despite this, the distribution of spam via WhatsApp remains understudied by researchers, in part because of the end-to-end encryption offered by the platform. This paper addresses this gap by studying spam on a dataset of 2.6 million messages sent to 5,051 public WhatsApp groups in India over 300 days. First, we characterise spam content shared within public groups and find that nearly 1 in 10 messages is spam. We observe a wide selection of topics ranging from job ads to adult content, and find that spammers post both URLs and phone numbers to promote material. Second, we inspect the nature of spammers themselves. We find that spam is often disseminated by groups of phone numbers, and that spam messages are generally shared for longer duration than non-spam messages. Finally, we devise content and activity based detection algorithms that can counter spam.