No Arabic abstract
Formal Verification (FV) and Machine Learning (ML) can seem incompatible due to their opposite mathematical foundations and their use in real-life problems: FV mostly relies on discrete mathematics and aims at ensuring correctness; ML often relies on probabilistic models and consists of learning patterns from training data. In this paper, we postulate that they are complementary in practice, and explore how ML helps FV in its classical approaches: static analysis, model-checking, theorem-proving, and SAT solving. We draw a landscape of the current practice and catalog some of the most prominent uses of ML inside FV tools, thus offering a new perspective on FV techniques that can help researchers and practitioners to better locate the possible synergies. We discuss lessons learned from our work, point to possible improvements and offer visions for the future of the domain in the light of the science of software and systems modeling.
In recent years, data and computing resources are typically distributed in the devices of end users, various regions or organizations. Because of laws or regulations, the distributed data and computing resources cannot be directly shared among different regions or organizations for machine learning tasks. Federated learning emerges as an efficient approach to exploit distributed data and computing resources, so as to collaboratively train machine learning models, while obeying the laws and regulations and ensuring data security and data privacy. In this paper, we provide a comprehensive survey of existing works for federated learning. We propose a functional architecture of federated learning systems and a taxonomy of related techniques. Furthermore, we present the distributed training, data communication, and security of FL systems. Finally, we analyze their limitations and propose future research directions.
Cyber Physical Systems (CPS) are characterized by their ability to integrate the physical and information or cyber worlds. Their deployment in critical infrastructure have demonstrated a potential to transform the world. However, harnessing this potential is limited by their critical nature and the far reaching effects of cyber attacks on human, infrastructure and the environment. An attraction for cyber concerns in CPS rises from the process of sending information from sensors to actuators over the wireless communication medium, thereby widening the attack surface. Traditionally, CPS security has been investigated from the perspective of preventing intruders from gaining access to the system using cryptography and other access control techniques. Most research work have therefore focused on the detection of attacks in CPS. However, in a world of increasing adversaries, it is becoming more difficult to totally prevent CPS from adversarial attacks, hence the need to focus on making CPS resilient. Resilient CPS are designed to withstand disruptions and remain functional despite the operation of adversaries. One of the dominant methodologies explored for building resilient CPS is dependent on machine learning (ML) algorithms. However, rising from recent research in adversarial ML, we posit that ML algorithms for securing CPS must themselves be resilient. This paper is therefore aimed at comprehensively surveying the interactions between resilient CPS using ML and resilient ML when applied in CPS. The paper concludes with a number of research trends and promising future research directions. Furthermore, with this paper, readers can have a thorough understanding of recent advances on ML-based security and securing ML for CPS and countermeasures, as well as research trends in this active research area.
We propose an epistemic approach to formalizing statistical properties of machine learning. Specifically, we introduce a formal model for supervised learning based on a Kripke model where each possible world corresponds to a possible dataset and modal operators are interpreted as transformation and testing on datasets. Then we formalize various notions of the classification performance, robustness, and fairness of statistical classifiers by using our extension of statistical epistemic logic (StatEL). In this formalization, we show relationships among properties of classifiers, and relevance between classification performance and robustness. As far as we know, this is the first work that uses epistemic models and logical formulas to express statistical properties of machine learning, and would be a starting point to develop theories of formal specification of machine learning.
Formal Methods for the Informal Engineer (FMIE) was a workshop held at the Broad Institute of MIT and Harvard in 2021 to explore the potential role of verified software in the biomedical software ecosystem. The motivation for organizing FMIE was the recognition that the life sciences and medicine are undergoing a transition from being passive consumers of software and AI/ML technologies to fundamental drivers of new platforms, including those which will need to be mission and safety-critical. Drawing on conversations leading up to and during the workshop, we make five concrete recommendations to help software leaders organically incorporate tools, techniques, and perspectives from formal methods into their project planning and development trajectories.
Inference attacks against Machine Learning (ML) models allow adversaries to learn information about training data, model parameters, etc. While researchers have studied these attacks thoroughly, they have done so in isolation. We lack a comprehensive picture of the risks caused by the attacks, such as the different scenarios they can be applied to, the common factors that influence their performance, the relationship among them, or the effectiveness of defense techniques. In this paper, we fill this gap by presenting a first-of-its-kind holistic risk assessment of different inference attacks against machine learning models. We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing - and establish a threat model taxonomy. Our extensive experimental evaluation conducted over five model architectures and four datasets shows that the complexity of the training dataset plays an important role with respect to the attacks performance, while the effectiveness of model stealing and membership inference attacks are negatively correlated. We also show that defenses like DP-SGD and Knowledge Distillation can only hope to mitigate some of the inference attacks. Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models, and equally serves as a benchmark tool for researchers and practitioners.