No Arabic abstract
Host-based anomaly detectors generate alarms by inspecting audit logs for suspicious behavior. Unfortunately, evaluating these anomaly detectors is hard. There are few high-quality, publicly-available audit logs, and there are no pre-existing frameworks that enable push-button creation of realistic system traces. To make trace generation easier, we created Xanthus, an automated tool that orchestrates virtual machines to generate realistic audit logs. Using Xanthus simple management interface, administrators select a base VM image, configure a particular tracing framework to use within that VM, and define post-launch scripts that collect and save trace data. Once data collection is finished, Xanthus creates a self-describing archive, which contains the VM, its configuration parameters, and the collected trace data. We demonstrate that Xanthus hides many of the tedious (yet subtle) orchestration tasks that humans often get wrong; Xanthus avoids mistakes that lead to non-replicable experiments.
Anonymous data collection systems allow users to contribute the data necessary to build services and applications while preserving their privacy. Anonymity, however, can be abused by malicious agents aiming to subvert or to sabotage the data collection, for instance by injecting fabricated data. In this paper we propose an efficient mechanism to rate-limit an attacker without compromising the privacy and anonymity of the users contributing data. The proposed system builds on top of Direct Anonymous Attestation, a proven cryptographic primitive. We describe how a set of rate-limiting rules can be formalized to define a normative space in which messages sent by an attacker can be linked, and consequently, dropped. We present all components needed to build and deploy such protection on existing data collection systems with little overhead. Empirical evaluation yields performance up to 125 and 140 messages per second for senders and the collector respectively on nominal hardware. Latency of communication is bound to 4 seconds in the 95th percentile when using Tor as network layer.
The emerging public awareness and government regulations of data privacy motivate new paradigms of collecting and analyzing data transparent and acceptable to data owners. We present a new concept of privacy and corresponding data formats, mechanisms, and tradeoffs for privatizing data during data collection. The privacy, named Interval Privacy, enforces the raw data conditional distribution on the privatized data to be the same as its unconditional distribution over a nontrivial support set. Correspondingly, the proposed privacy mechanism will record each data value as a random interval containing it. The proposed interval privacy mechanisms can be easily deployed through most existing survey-based data collection paradigms, e.g., by asking a respondent whether its data value is within a randomly generated range. Another unique feature of interval mechanisms is that they obfuscate the truth but not distort it. The way of using narrowed range to convey information is complementary to the popular paradigm of perturbing data. Also, the interval mechanisms can generate progressively refined information at the discretion of individual respondents. We study different theoretical aspects of the proposed privacy. In the context of supervised learning, we also offer a method such that existing supervised learning algorithms designed for point-valued data could be directly applied to learning from interval-valued data.
Data provenance collects comprehensive information about the events and operations in a computer system at both application and system levels. It provides a detailed and accurate history of transactions that help delineate the data flow scenario across the whole system. Data provenance helps achieve system resilience by uncovering several malicious attack traces after a system compromise that are leveraged by the analyzer to understand the attack behavior and discover the level of damage. Existing literature demonstrates a number of research efforts on information capture, management, and analysis of data provenance. In recent years, provenance in IoT devices attracts several research efforts because of the proliferation of commodity IoT devices. In this survey paper, we present a comparative study of the state-of-the-art approaches to provenance by classifying them based on frameworks, deployed techniques, and subjects of interest. We also discuss the emergence and scope of data provenance in IoT networks. Finally, we present the urgency in several directions that data provenance needs to pursue, including data management and analysis.
Local Differential Privacy (LDP) is popularly used in practice for privacy-preserving data collection. Although existing LDP protocols offer high utility for large user populations (100,000 or more users), they perform poorly in scenarios with small user populations (such as those in the cybersecurity domain) and lack perturbation mechanisms that are effective for both ordinal and non-ordinal item sequences while protecting sequence length and content simultaneously. In this paper, we address the small user population problem by introducing the concept of Condensed Local Differential Privacy (CLDP) as a specialization of LDP, and develop a suite of CLDP protocols that offer desirable statistical utility while preserving privacy. Our protocols support different types of client data, ranging from ordinal data types in finite metric spaces (numeric malware infection statistics), to non-ordinal items (O
Cyber-Physical Systems (CPSs) are increasingly important in critical areas of our society such as intelligent power grids, next generation mobile devices, and smart buildings. CPS operation has characteristics including considerable heterogeneity, variable dynamics, and high complexity. These systems have also scarce resources in order to satisfy their entire load demand, which can be divided into data processing and service execution. These new characteristics of CPSs need to be managed with novel strategies to ensure their resilient operation. Towards this goal, we propose an SDN-based solution enhanced by distributed Network Function Virtualization (NFV) modules located at the top-most level of our solution architecture. These NFV agents will take orchestrated management decisions among themselves to ensure a resilient CPS configuration against threats, and an optimum operation of the CPS. For this, we study and compare two distinct incentive mechanisms to enforce cooperation among NFVs. Thus, we aim to offer novel perspectives into the management of resilient CPSs, embedding IoT devices, modeled by Game Theory (GT), using the latest software and virtualization platforms.