Automated Identification of Security Discussions in Microservices Systems: Industrial Surveys and Experiments


Abstract in English

Lack of awareness and knowledge of microservices-specific security challenges and solutions often leads to ill-informed security decisions in microservices system development. We claim that identifying and leveraging security discussions scattered in existing microservices systems can partially close this gap. We define security discussion as a paragraph from developer discussions that includes design decisions, challenges, or solutions relating to security. We first surveyed 67 practitioners and found that securing microservices systems is a unique challenge and that having access to security discussions is useful for making security decisions. The survey also confirms the usefulness of potential tools that can automatically identify such security discussions. We developed fifteen machine/deep learning models to automatically identify security discussions. We applied these models on a manually constructed dataset consisting of 4,813 security discussions and 12,464 non-security discussions. We found that all the models can effectively identify security discussions: an average precision of 84.86%, recall of 72.80%, F1-score of 77.89%, AUC of 83.75% and G-mean 82.77%. DeepM1, a deep learning model, performs the best, achieving above 84% in all metrics and significantly outperforms three baselines. Finally, the practitioners feedback collected from a validation survey reveals that security discussions identified by DeepM1 have promising applications in practice.

Download