ترغب بنشر مسار تعليمي؟ اضغط هنا

We introduce Robust Restless Bandits, a challenging generalization of restless multi-arm bandits (RMAB). RMABs have been widely studied for intervention planning with limited resources. However, most works make the unrealistic assumption that the tra nsition dynamics are known perfectly, restricting the applicability of existing methods to real-world scenarios. To make RMABs more useful in settings with uncertain dynamics: (i) We introduce the Robust RMAB problem and develop solutions for a minimax regret objective when transitions are given by interval uncertainties; (ii) We develop a double oracle algorithm for solving Robust RMABs and demonstrate its effectiveness on three experimental domains; (iii) To enable our double oracle approach, we introduce RMABPPO, a novel deep reinforcement learning algorithm for solving RMABs. RMABPPO hinges on learning an auxiliary $lambda$-network that allows each arms learning to decouple, greatly reducing sample complexity required for training; (iv) Under minimax regret, the adversary in the double oracle approach is notoriously difficult to implement due to non-stationarity. To address this, we formulate the adversary oracle as a multi-agent reinforcement learning problem and solve it with a multi-agent extension of RMABPPO, which may be of independent interest as the first known algorithm for this setting. Code is available at https://github.com/killian-34/RobustRMAB.
A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers best response as constraints in the leaders optimization problem. These reformulation approaches can sometimes be effective, but often get trapped in low-quality solutions when followers objectives are non-linear or non-quadratic. Moreover, these approaches assume a unique equilibrium or a specific equilibrium concept, e.g., optimistic or pessimistic, which is a limiting assumption in many situations. To overcome these limitations, we propose a stochastic gradient descent--based approach, where the leaders strategy is updated by differentiating through the followers best responses. We frame the leaders optimization as a learning problem against followers equilibrium, which allows us to decouple the followers equilibrium constraints from the leaders problem. This approach also addresses cases with multiple equilibria and arbitrary equilibrium selection procedures by back-propagating through a sampled Nash equilibrium. To this end, this paper introduces a novel concept called equilibrium flow to formally characterize the set of equilibrium selection processes where the gradient with respect to a sampled equilibrium is an unbiased estimate of the true gradient. We evaluate our approach experimentally against existing baselines in three Stackelberg problems with multiple followers and find that in each case, our approach is able to achieve higher utility for the leader.
Aluminum nitride (AlN) plays a key role in modern power electronics and deep-ultraviolet photonics, where an understanding of its thermal properties is essential. Here we measure the thermal conductivity of crystalline AlN by the 3${omega}$ method, f inding it ranges from 674 ${pm}$ 56 W/m/K at 100 K to 186 ${pm}$ 7 W/m/K at 400 K, with a value of 237 ${pm}$ 6 W/m/K at room temperature. We compare these data with analytical models and first principles calculations, taking into account atomic-scale defects (O, Si, C impurities, and Al vacancies). We find Al vacancies play the greatest role in reducing thermal conductivity because of the largest mass-difference scattering. Modeling also reveals that 10% of heat conduction is contributed by phonons with long mean free paths, over ~7 ${mu}$m at room temperature, and 50% by phonons with MFPs over ~0.3 ${mu}$m. Consequently, the effective thermal conductivity of AlN is strongly reduced in sub-micron thin films or devices due to phonon-boundary scattering.
Illegal wildlife poaching threatens ecosystems and drives endangered species toward extinction. However, efforts for wildlife protection are constrained by the limited resources of law enforcement agencies. To help combat poaching, the Protection Ass istant for Wildlife Security (PAWS) is a machine learning pipeline that has been developed as a data-driven approach to identify areas at high risk of poaching throughout protected areas and compute optimal patrol routes. In this paper, we take an end-to-end approach to the data-to-deployment pipeline for anti-poaching. In doing so, we address challenges including extreme class imbalance (up to 1:200), bias, and uncertainty in wildlife poaching data to enhance PAWS, and we apply our methodology to three national parks with diverse characteristics. (i) We use Gaussian processes to quantify predictive uncertainty, which we exploit to improve robustness of our prescribed patrols and increase detection of snares by an average of 30%. We evaluate our approach on real-world historical poaching data from Murchison Falls and Queen Elizabeth National Parks in Uganda and, for the first time, Srepok Wildlife Sanctuary in Cambodia. (ii) We present the results of large-scale field tests conducted in Murchison Falls and Srepok Wildlife Sanctuary which confirm that the predictive power of PAWS extends promisingly to multiple parks. This paper is part of an effort to expand PAWS to 800 parks around the world through integration with SMART conservation software.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا