ﻻ يوجد ملخص باللغة العربية
Automated detection of software vulnerabilities is a fundamental problem in software security. Existing program analysis techniques either suffer from high false positives or false negatives. Recent progress in Deep Learning (DL) has resulted in a surge of interest in applying DL for automated vulnerability detection. Several recent studies have demonstrated promising results achieving an accuracy of up to 95% at detecting vulnerabilities. In this paper, we ask, how well do the state-of-the-art DL-based techniques perform in a real-world vulnerability prediction scenario?. To our surprise, we find that their performance drops by more than 50%. A systematic investigation of what causes such precipitous performance drop reveals that existing DL-based vulnerability prediction approaches suffer from challenges with the training data (e.g., data duplication, unrealistic distribution of vulnerable classes, etc.) and with the model choices (e.g., simple token-based models). As a result, these approaches often do not learn features related to the actual cause of the vulnerabilities. Instead, they learn unrelated artifacts from the dataset (e.g., specific variable/function names, etc.). Leveraging these empirical findings, we demonstrate how a more principled approach to data collection and model design, based on realistic settings of vulnerability prediction, can lead to better solutions. The resulting tools perform significantly better than the studied baseline: up to 33.57% boost in precision and 128.38% boost in recall compared to the best performing model in the literature. Overall, this paper elucidates existing DL-based vulnerability prediction systems potential issues and draws a roadmap for future DL-based vulnerability prediction research. In that spirit, we make available all the artifacts supporting our results: https://git.io/Jf6IA.
Developers create software branches for tentative feature addition and bug fixing, and periodically merge branches to release software with new features or repairing patches. When the program edits from different branches textually overlap (i.e., tex
Embodied instruction following is a challenging problem requiring an agent to infer a sequence of primitive actions to achieve a goal environment state from complex language and visual inputs. Action Learning From Realistic Environments and Directive
In Cyberspace nowadays, there is a burst of information that everyone has access. However, apart from the advantages the Internet offers, it also hides numerous dangers for both people and nations. Cyberspace has a dark side, including terrorism, bul
Decade-long timing observations of arrays of millisecond pulsars have placed highly constraining upper limits on the amplitude of the nanohertz gravitational-wave stochastic signal from the mergers of supermassive black-hole binaries ($sim 10^{-15}$
Accurate molecular crystal structure prediction is a fundamental goal in academic and industrial condensed matter research and polymorphism is arguably the biggest obstacle on the way. We tackle this challenge in the difficult case of the repeatedly