Statistical approaches to cyber-security involve building realistic probability models of computer network data. In a data pre-processing phase, separating automated events from those caused by human activity should improve statistical model building and enhance anomaly detection capabilities. This article presents a changepoint detection framework for identifying periodic subsequences of event times. The opening event of each subsequence can be interpreted as a human action which then generates an automated, periodic process. Difficulties arising from the presence of duplicate and missing data are addressed. The methodology is demonstrated using authentication data from the computer network of Los Alamos National Laboratory.
Let $W^{(n)}$ be the $n$-letter word obtained by repeating a fixed word $W$, and let $R_n$ be a random $n$-letter word over the same alphabet. We show several results about the length of the longest common subsequence (LCS) between $W^{(n)}$ and $R_n$; in particular, we show that its expectation is $gamma_W n-O(sqrt{n})$ for an efficiently-computable constant $gamma_W$. This is done by relating the problem to a new interacting particle system, which we dub frog dynamics. In this system, the particles (`frogs) hop over one another in the order given by their labels. Stripped of the labeling, the frog dynamics reduces to a variant of the PushTASEP. In the special case when all symbols of $W$ are distinct, we obtain an explicit formula for the constant $gamma_W$ and a closed-form expression for the stationary distribution of the associated frog dynamics. In addition, we propose new conjectures about the asymptotic of the LCS of a pair of random words. These conjectures are informed by computer experiments using a new heuristic algorithm to compute the LCS. Through our computations, we found periodic words that are more random-like than a random word, as measured by the LCS.
It is challenging to assess the vulnerability of a cyber-physical power system to data attacks from an integral perspective. In order to support vulnerability assessment except analytic analysis, suitable platform for security tests needs to be developed. In this paper we analyze the cyber security of energy management system (EMS) against data attacks. First we extend our analytic framework that characterizes data attacks as optimization problems with the objectives specified as security metrics and constraints corresponding to the communication network properties. Second, we build a platform in the form of co-simulation - coupling the power system simulator DIgSILENT PowerFactory with communication network simulator OMNeT++, and Matlab for EMS applications (state estimation, optimal power flow). Then the framework is used to conduct attack simulations on the co-simulation based platform for a power grid test case. The results indicate how vulnerable of EMS to data attacks and how co-simulation can help assess vulnerability.
Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus ($sim$750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.
We consider malicious attacks on actuators and sensors of a feedback system which can be modeled as additive, possibly unbounded, disturbances at the digital (cyber) part of the feedback loop. We precisely characterize the role of the unstable poles and zeros of the system in the ability to detect stealthy attacks in the context of the sampled data implementation of the controller in feedback with the continuous (physical) plant. We show that, if there is a single sensor that is guaranteed to be secure and the plant is observable from that sensor, then there exist a class of multirate sampled data controllers that ensure that all attacks remain detectable. These dual rate controllers are sampling the output faster than the zero order hold rate that operates on the control input and as such, they can even provide better nominal performance than single rate, at the price of higher sampling of the continuous output.
The various types of communication technologies and mobility features in Internet of Things (IoT) on the one hand enable fruitful and attractive applications, but on the other hand facilitates malware propagation, thereby raising new challenges on handling IoT-empowered malware for cyber security. Comparing with the malware propagation control scheme in traditional wireless networks where nodes can be directly repaired and secured, in IoT, compromised end devices are difficult to be patched. Alternatively, blocking malware via patching intermediate nodes turns out to be a more feasible and practical solution. Specifically, patching intermediate nodes can effectively prevent the proliferation of malware propagation by securing infrastructure links and limiting malware propagation to local device-to-device dissemination. This article proposes a novel traffic-aware patching scheme to select important intermediate nodes to patch, which applies to the IoT system with limited patching resources and response time constraint. Experiments on real-world trace datasets in IoT networks are conducted to demonstrate the advantage of the proposed traffic-aware patching scheme in alleviating malware propagation.