ﻻ يوجد ملخص باللغة العربية
Cache prefetching technology has become the mainstream data access optimization strategy in the data centers. However, the rapidly increasing of unstructured data generates massive pairwise access relationships, which can result in a heavy computational burden for the existing prefetching model and lead to severe degradation in the performance of data access. We propose cache-transaction-based data grouping model (CTDGM) to solve the problems described above by optimizing the feature representation method and grouping efficiency. First, we provide the definition of the cache transaction and propose the method for extracting the cache transaction feature (CTF). Second, we design a data chunking algorithm based on CTF and spatiotemporal locality to optimize the relationship calculation efficiency. Third, we propose CTDGM by constructing a relation graph that groups data into independent groups according to the strength of the data access relation. Based on the results of the experiment, compared with the state-of-the-art methods, our algorithm achieves an average increase in the cache hit rate of 12% on the MSR dataset with small cache size (0.001% of all the data), which in turn reduces the number of data I/O accesses by 50% when the cache size is less than 0.008% of all the data.
Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriat
Leader-based data replication improves consistency in highly available distributed storage systems via sequential writes to the leader nodes. After a write has been committed by the leaders, follower nodes are written by a multicast mechanism and are
Erasure codes are increasingly being studied in the context of implementing atomic memory objects in large scale asynchronous distributed storage systems. When compared with the traditional replication based schemes, erasure codes have the potential
Cloud-based enterprise search services (e.g., Amazon Kendra) are enchanting to big data owners by providing them with convenient search solutions over their enterprise big datasets. However, individuals and businesses that deal with confidential big
To achieve reliability in distributed storage systems, data has usually been replicated across different nodes. However the increasing volume of data to be stored has motivated the introduction of erasure codes, a storage efficient alternative to rep