Do you want to publish a course? Click here

An Imitation Learning Approach for Cache Replacement

76   0   0.0 ( 0 )
 Added by Evan Liu
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

Program execution speed critically depends on increasing cache hits, as cache hits are orders of magnitude faster than misses. To increase cache hits, we focus on the problem of cache replacement: choosing which cache line to evict upon inserting a new line. This is challenging because it requires planning far ahead and currently there is no known practical solution. As a result, current replacement policies typically resort to heuristics designed for specific common access patterns, which fail on more diverse and complex access patterns. In contrast, we propose an imitation learning approach to automatically learn cache access patterns by leveraging Beladys, an oracle policy that computes the optimal eviction decision given the future cache accesses. While directly applying Beladys is infeasible since the future is unknown, we train a policy conditioned only on past accesses that accurately approximates Beladys even on diverse and complex access patterns, and call this approach Parrot. When evaluated on 13 of the most memory-intensive SPEC applications, Parrot increases cache miss rates by 20% over the current state of the art. In addition, on a large-scale web search benchmark, Parrot increases cache hit rates by 61% over a conventional LRU policy. We release a Gym environment to facilitate research in this area, as data is plentiful, and further advancements can have significant real-world impact.



rate research

Read More

231 - Sarwan Ali 2021
Cache replacement algorithms are used to optimize the time taken by processor to process the information by storing the information needed by processor at that time and possibly in future so that if processor needs that information, it can be provided immediately. There are a number of techniques (LIFO, FIFO, LRU, MRU, Hybrid) used to organize information in such a way that processor remains busy almost all the time. But there are some limitations of every technique. We tried to overcome those limitations. We used Probabilistic Graphical Model(PGM), which gives conditional dependency between random variables using directed or undirected graph. In our research, we exploited the Bayesian network technique to predict the future request by processor. The main goal of the research was to increase the cache hit rate but not by increasing the size of cache and also reducing or maintaining the overhead. We achieved 7% more cache hits in best case scenario than those classical algorithms by using PGM technique. This proves the success of our technique as far as cache hits are concerned. Also, pre-eviction proves to be a better technique to get more cache hits. Combining both pre-eviction and pre-fetching using PGM gives us the results which were intended to achieve as the sole purpose of this research.
This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agents past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration. Our empirical results show that SIL significantly improves advantage actor-critic (A2C) on several hard exploration Atari games and is competitive to the state-of-the-art count-based exploration methods. We also show that SIL improves proximal policy optimization (PPO) on MuJoCo tasks.
A common strategy in modern learning systems is to learn a representation that is useful for many tasks, a.k.a. representation learning. We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts trajectories are available. We formulate representation learning as a bi-level optimization problem where the outer optimization tries to learn the joint representation and the inner optimization encodes the imitation learning setup and tries to learn task-specific parameters. We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone. Theoretically, we show using our framework that representation learning can provide sample complexity benefits for imitation learning in both settings. We also provide proof-of-concept experiments to verify our theory.
Imitation learning (IL) aims to learn a policy from expert demonstrations that minimizes the discrepancy between the learner and expert behaviors. Various imitation learning algorithms have been proposed with different pre-determined divergences to quantify the discrepancy. This naturally gives rise to the following question: Given a set of expert demonstrations, which divergence can recover the expert policy more accurately with higher data efficiency? In this work, we propose $f$-GAIL, a new generative adversarial imitation learning (GAIL) model, that automatically learns a discrepancy measure from the $f$-divergence family as well as a policy capable of producing expert-like behaviors. Compared with IL baselines with various predefined divergence measures, $f$-GAIL learns better policies with higher data efficiency in six physics-based control tasks.
In recent years, a myriad of advanced results have been reported in the community of imitation learning, ranging from parametric to non-parametric, probabilistic to non-probabilistic and Bayesian to frequentist approaches. Meanwhile, ample applications (e.g., grasping tasks and human-robot collaborations) further show the applicability of imitation learning in a wide range of domains. While numerous literature is dedicated to the learning of human skills in unconstrained environment, the problem of learning constrained motor skills, however, has not received equal attention yet. In fact, constrained skills exist widely in robotic systems. For instance, when a robot is demanded to write letters on a board, its end-effector trajectory must comply with the plane constraint from the board. In this paper, we aim to tackle the problem of imitation learning with linear constraints. Specifically, we propose to exploit the probabilistic properties of multiple demonstrations, and subsequently incorporate them into a linearly constrained optimization problem, which finally leads to a non-parametric solution. In addition, a connection between our framework and the classical model predictive control is provided. Several examples including simulated writing and locomotion tasks are presented to show the effectiveness of our framework.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا