ﻻ يوجد ملخص باللغة العربية
We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the systems ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 million binaries with 778GB of raw feature data. Without reviewer assistance, we achieve 72% detection at a 0.5% false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89% and are able to detect 42% of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of training labels obtained at the same time training data is first seen with training labels obtained months later. We find that using training labels obtained well after samples appear, and thus unavailable in practice for current training data, inflates measured detection by almost 20 percentage points. We release our cluster-based implementation, as well as a list of all hashes in our evaluation and 3% of our entire dataset.
Currently, Android malware detection is mostly performed on server side against the increasing number of malware. Powerful computing resource provides more exhaustive protection for app markets than maintaining detection by a single user. However, ap
Large software platforms (e.g., mobile app stores, social media, email service providers) must ensure that files on their platform do not contain malicious code. Platform hosts use security tools to analyze those files for potential malware. However,
Deep learning has been used in the research of malware analysis. Most classification methods use either static analysis features or dynamic analysis features for malware family classification, and rarely combine them as classification features and al
Cybersecurity continues to be a difficult issue for society especially as the number of networked systems grows. Techniques to protect these systems range from rules-based to artificial intelligence-based intrusion detection systems and anti-virus to
Malware remains a big threat to cyber security, calling for machine learning based malware detection. While promising, such detectors are known to be vulnerable to evasion attacks. Ensemble learning typically facilitates countermeasures, while attack