ترغب بنشر مسار تعليمي؟ اضغط هنا

On Decentralized Estimation with Active Queries

33   0   0.0 ( 0 )
 نشر من قبل Theodoros Tsiligkaridis
 تاريخ النشر 2013
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider the problem of decentralized 20 questions with noise for multiple players/agents under the minimum entropy criterion in the setting of stochastic search over a parameter space, with application to target localization. We propose decentralized extensions of the active query-based stochastic search strategy that combines elements from the 20 questions approach and social learning. We prove convergence to correct consensus on the value of the parameter. This framework provides a flexible and tractable mathematical model for decentralized parameter estimation systems based on active querying. We illustrate the effectiveness and robustness of the proposed decentralized collaborative 20 questions algorithm for random network topologies with information sharing.

قيم البحث

اقرأ أيضاً

We study the multi-agent safe control problem where agents should avoid collisions to static obstacles and collisions with each other while reaching their goals. Our core idea is to learn the multi-agent control policy jointly with learning the contr ol barrier functions as safety certificates. We propose a novel joint-learning framework that can be implemented in a decentralized fashion, with generalization guarantees for certain function classes. Such a decentralized framework can adapt to an arbitrarily large number of agents. Building upon this framework, we further improve the scalability by incorporating neural network architectures that are invariant to the quantity and permutation of neighboring agents. In addition, we propose a new spontaneous policy refinement method to further enforce the certificate condition during testing. We provide extensive experiments to demonstrate that our method significantly outperforms other leading multi-agent control approaches in terms of maintaining safety and completing original tasks. Our approach also shows exceptional generalization capability in that the control policy can be trained with 8 agents in one scenario, while being used on other scenarios with up to 1024 agents in complex multi-agent environments and dynamics.
In this paper, we consider the problem of controlling a partially observed Markov decision process (POMDP) in order to actively estimate its state trajectory over a fixed horizon with minimal uncertainty. We pose a novel active smoothing problem in w hich the objective is to directly minimise the smoother entropy, that is, the conditional entropy of the (joint) state trajectory distribution of concern in fixed-interval Bayesian smoothing. Our formulation contrasts with prior active approaches that minimise the sum of conditional entropies of the (marginal) state estimates provided by Bayesian filters. By establishing a novel form of the smoother entropy in terms of the POMDP belief (or information) state, we show that our active smoothing problem can be reformulated as a (fully observed) Markov decision process with a value function that is concave in the belief state. The concavity of the value function is of particular importance since it enables the approximate solution of our active smoothing problem using piecewise-linear function approximations in conjunction with standard POMDP solvers. We illustrate the approximate solution of our active smoothing problem in simulation and compare its performance to alternative approaches based on minimising marginal state estimate uncertainties.
Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized setting, us ing temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of a group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an `additional projection step to control the `gradient bias incurred by the Markovian observations. In this paper, we provide a finite-sample analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a small neighborhood of the optimum. The resultant error bounds are the first of its type---in the sense that they hold under the most practical assumptions ---which is made possible by means of a novel multi-step Lyapunov analysis.
The paper studies distributed static parameter (vector) estimation in sensor networks with nonlinear observation models and noisy inter-sensor communication. It introduces emph{separably estimable} observation models that generalize the observability condition in linear centralized estimation to nonlinear distributed estimation. It studies two distributed estimation algorithms in separably estimable models, the $mathcal{NU}$ (with its linear counterpart $mathcal{LU}$) and the $mathcal{NLU}$. Their update rule combines a emph{consensus} step (where each sensor updates the state by weight averaging it with its neighbors states) and an emph{innovation} step (where each sensor processes its local current observation.) This makes the three algorithms of the textit{consensus + innovations} type, very different from traditional consensus. The paper proves consistency (all sensors reach consensus almost surely and converge to the true parameter value,) efficiency, and asymptotic unbiasedness. For $mathcal{LU}$ and $mathcal{NU}$, it proves asymptotic normality and provides convergence rate guarantees. The three algorithms are characterized by appropriately chosen decaying weight sequences. Algorithms $mathcal{LU}$ and $mathcal{NU}$ are analyzed in the framework of stochastic approximation theory; algorithm $mathcal{NLU}$ exhibits mixed time-scale behavior and biased perturbations, and its analysis requires a different approach that is developed in the paper.
Stochastic stability for centralized time-varying Kalman filtering over a wireles ssensor network with correlated fading channels is studied. On their route to the gateway, sensor packets, possibly aggregated with measurements from several nodes, may be dropped because of fading links. To study this situation, we introduce a network state process, which describes a finite set of configurations of the radio environment. The network state characterizes the channel gain distributions of the links, which are allowed to be correlated between each other. Temporal correlations of channel gains are modeled by allowing the network state process to form a (semi-)Markov chain. We establish sufficient conditions that ensure the Kalman filter to be exponentially bounded. In the one-sensor case, this new stability condition is shown to include previous results obtained in the literature as special cases. The results also hold when using power and bit-rate control policies, where the transmission power and bit-rate of each node are nonlinear mapping of the network state and channel gains.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا