No Arabic abstract
We present a hierarchical simulation approach for the dependability analysis and evaluation of a highly available commercial cache-based RAID storage system. The archi-tecture is complex and includes several layers of overlap-ping error detection and recovery mechanisms. Three ab-straction levels have been developed to model the cache architecture, cache operations, and error detection and recovery mechanism. The impact of faults and errors oc-curring in the cache and in the disks is analyzed at each level of the hierarchy. A simulation submodel is associated with each abstraction level. The models have been devel-oped using DEPEND, a simulation-based environment for system-level dependability analysis, which provides facili-ties to inject faults into a functional behavior model, to simulate error detection and recovery mechanisms, and to evaluate quantitative measures. Several fault models are defined for each submodel to simulate cache component failures, disk failures, transmission errors, and data errors in the cache memory and in the disks. Some of the parame-ters characterizing fault injection in a given submodel cor-respond to probabilities evaluated from the simulation of the lower-level submodel. Based on the proposed method-ology, we evaluate and analyze 1) the system behavior un-der a real workload and high error rate (focusing on error bursts), 2) the coverage of the error detection mechanisms implemented in the system and the error latency distribu-tions, and 3) the accumulation of errors in the cache and in the disks.
For efficiency reasons, the software system designers will is to use an integrated set of methods and tools to describe specifications and designs, and also to perform analyses such as dependability, schedulability and performance. AADL (Architecture Analysis and Design Language) has proved to be efficient for software architecture modeling. In addition, AADL was designed to accommodate several types of analyses. This paper presents an iterative dependency-driven approach for dependability modeling using AADL. It is illustrated on a small example. This approach is part of a complete framework that allows the generation of dependability analysis and evaluation models from AADL models to support the analysis of software and system architectures, in critical application domains.
In this paper, we proposed an effective and efficient multi-core shared-cache design optimization approach based on reuse-distance analysis of the data traces of target applications. Since data traces are independent of system hardware architectures, a designer can easily compute the best cache design at the early system design phase using our approach. We devise a very efficient and yet accurate method to derive the aggregated reuse-distance histograms of concurrent applications for accurate cache performance analysis and optimization. Essentially, the actual shared-cache contention results of concurrent applications are embedded in the aggregated reuse-distance histograms and therefore the approach is very effective. The experimental results show that the average error rate of shared-cache miss-count estimations of our approach is less than 2.4%. Using a simple scanning search method, one can easily determine the true optimal cache configurations at the early system design phase.
We consider a distributed storage system which stores several hot (popular) and cold (less popular) data files across multiple nodes or servers. Hot files are stored using repetition codes while cold files are stored using erasure codes. The nodes are prone to failure and hence at any given time, we assume that only a fraction of the nodes are available. Using a cavity process based mean field framework, we analyze the download time for users accessing hot or cold data in the presence of failed nodes. Our work also illustrates the impact of the choice of the storage code on the download time performance of users in the system.
The paper presents techniques for analyzing the expected download time in distributed storage systems that employ systematic availability codes. These codes provide access to hot data through the systematic server containing the object and multiple recovery groups. When a request for an object is received, it can be replicated (forked) to the systematic server and all recovery groups. We first consider the low-traffic regime and present the close-form expression for the download time. By comparison across systems with availability, maximum distance separable (MDS), and replication codes, we demonstrate that availability codes can reduce download time in some settings but are not always optimal. In the high-traffic regime, the system consists of multiple inter-dependent Fork-Join queues, making exact analysis intractable. Accordingly, we present upper and lower bounds on the download time, and an M/G/1 queue approximation for several cases of interest. Via extensive numerical simulations, we evaluate our bounds and demonstrate that the M/G/1 queue approximation has a high degree of accuracy.
Cache prefetching technology has become the mainstream data access optimization strategy in the data centers. However, the rapidly increasing of unstructured data generates massive pairwise access relationships, which can result in a heavy computational burden for the existing prefetching model and lead to severe degradation in the performance of data access. We propose cache-transaction-based data grouping model (CTDGM) to solve the problems described above by optimizing the feature representation method and grouping efficiency. First, we provide the definition of the cache transaction and propose the method for extracting the cache transaction feature (CTF). Second, we design a data chunking algorithm based on CTF and spatiotemporal locality to optimize the relationship calculation efficiency. Third, we propose CTDGM by constructing a relation graph that groups data into independent groups according to the strength of the data access relation. Based on the results of the experiment, compared with the state-of-the-art methods, our algorithm achieves an average increase in the cache hit rate of 12% on the MSR dataset with small cache size (0.001% of all the data), which in turn reduces the number of data I/O accesses by 50% when the cache size is less than 0.008% of all the data.