ترغب بنشر مسار تعليمي؟ اضغط هنا

A Theoretical Connection Between Statistical Physics and Reinforcement Learning

146   0   0.0 ( 0 )
 نشر من قبل Jad Rahme
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforcement learning using the tools and abstractions of statistical physics. The central object in the statistical physics abstraction is the idea of a partition function $mathcal{Z}$, and here we construct a partition function from the ensemble of possible trajectories that an agent might take in a Markov decision process. Although value functions and $Q$-functions can be derived from this partition function and interpreted via average energies, the $mathcal{Z}$-function provides an object with its own Bellman equation that can form the basis of alternative dynamic programming approaches. Moreover, when the MDP dynamics are deterministic, the Bellman equation for $mathcal{Z}$ is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions. The policies learned via these $mathcal{Z}$-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations. In addition to sampling actions proportionally to the exponential of the expected cumulative reward as Boltzmann policies would, these policies take entropy into account favoring states from which many outcomes are possible.



قيم البحث

اقرأ أيضاً

In this paper we aim to demonstrate how physical perspective enriches usual statistical analysis when dealing with a complex system of many interacting agents of non-physical origin. To this end, we discuss analysis of urban public transportation net works viewed as complex systems. In such studies, a multi-disciplinary approach is applied by integrating methods in both data processing and statistical physics to investigate the correlation between public transportation network topological features and their operational stability. The studies incorporate concepts of coarse graining and clusterization, universality and scaling, stability and percolation behavior, diffusion and fractal analysis.
406 - Dietrich Stauffer 2011
The image of physics is connected with simple mechanical deterministic events: that an apple always falls down, that force equals mass times acceleleration. Indeed, applications of such concept to social or historical problems go back two centuries ( population growth and stabilisation, by Malthus and by Verhulst) and use differential equations, as recently revierwed by Vitanov and Ausloos [2011]. However, since even todays computers cannot follow the motion of all air molecules within one cubic centimeter, the probabilistic approach has become fashionable since Ludwig Boltzmann invented Statistical Physics in the 19th century. Computer simulations in Statistical Physics deal with single particles, a method called agent-based modelling in fields which adopted it later. Particularly simple are binary models where each particle has only two choices, called spin up and spin down by physicists, bit zero and bit one by computer scientists, and voters for the Republicans or for the Democrats in American politics (where one human is simulated as one particle). Neighbouring particles may influence each other, and the Ising model of 1925 is the best-studied example of such models. This text will explain to the reader how to program the Ising model on a square lattice (in Fortran language); starting from there the readers can build their own computer programs. Some applications of Statistical Physics outside the natural sciences will be listed.
Contact-tracing is an essential tool in order to mitigate the impact of pandemic such as the COVID-19. In order to achieve efficient and scalable contact-tracing in real time, digital devices can play an important role. While a lot of attention has b een paid to analyzing the privacy and ethical risks of the associated mobile applications, so far much less research has been devoted to optimizing their performance and assessing their impact on the mitigation of the epidemic. We develop Bayesian inference methods to estimate the risk that an individual is infected. This inference is based on the list of his recent contacts and their own risk levels, as well as personal information such as results of tests or presence of syndromes. We propose to use probabilistic risk estimation in order to optimize testing and quarantining strategies for the control of an epidemic. Our results show that in some range of epidemic spreading (typically when the manual tracing of all contacts of infected people becomes practically impossible, but before the fraction of infected people reaches the scale where a lock-down becomes unavoidable), this inference of individuals at risk could be an efficient way to mitigate the epidemic. Our approaches translate into fully distributed algorithms that only require communication between individuals who have recently been in contact. Such communication may be encrypted and anonymized and thus compatible with privacy preserving standards. We conclude that probabilistic risk estimation is capable to enhance performance of digital contact tracing and should be considered in the currently developed mobile applications.
Typical properties of computing circuits composed of noisy logical gates are studied using the statistical physics methodology. A growth model that gives rise to typical random Boolean functions is mapped onto a layered Ising spin system, which facil itates the study of their ability to represent arbitrary formulae with a given level of error, the tolerable level of gate-noise, and its dependence on the formulae depth and complexity, the gates used and properties of the function inputs. Bounds on their performance, derived in the information theory literature via specific gates, are straightforwardly retrieved, generalized and identified as the corresponding typical-case phase transitions. The framework is employed for deriving results on error-rates, function-depth and sensitivity, and their dependence on the gate-type and noise model used that are difficult to obtain via the traditional methods used in this field.
Statistical physics has proven to be a very fruitful framework to describe phenomena outside the realm of traditional physics. The last years have witnessed the attempt by physicists to study collective phenomena emerging from the interactions of ind ividuals as elementary units in social structures. Here we review the state of the art by focusing on a wide list of topics ranging from opinion, cultural and language dynamics to crowd behavior, hierarchy formation, human dynamics, social spreading. We highlight the connections between these problems and other, more traditional, topics of statistical physics. We also emphasize the comparison of model results with empirical data from social systems.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا