أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Lily Xu

Robust Restless Bandits: Tackling Interval Uncertainty with Deep Reinforcement Learning

139 - Jackson A. Killian , Lily Xu , Arpita Biswas 2021

We introduce Robust Restless Bandits, a challenging generalization of restless multi-arm bandits (RMAB). RMABs have been widely studied for intervention planning with limited resources. However, most works make the unrealistic assumption that the tra nsition dynamics are known perfectly, restricting the applicability of existing methods to real-world scenarios. To make RMABs more useful in settings with uncertain dynamics: (i) We introduce the Robust RMAB problem and develop solutions for a minimax regret objective when transitions are given by interval uncertainties; (ii) We develop a double oracle algorithm for solving Robust RMABs and demonstrate its effectiveness on three experimental domains; (iii) To enable our double oracle approach, we introduce RMABPPO, a novel deep reinforcement learning algorithm for solving RMABs. RMABPPO hinges on learning an auxiliary $lambda$-network that allows each arms learning to decouple, greatly reducing sample complexity required for training; (iv) Under minimax regret, the adversary in the double oracle approach is notoriously difficult to implement due to non-stationarity. To address this, we formulate the adversary oracle as a multi-agent reinforcement learning problem and solve it with a multi-agent extension of RMABPPO, which may be of independent interest as the first known algorithm for this setting. Code is available at https://github.com/killian-34/RobustRMAB.

التعلم الآلي

Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games

74 - Kai Wang , Lily Xu , Andrew Perrault 2021

A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers best response as constraints in the leaders optimization problem. These reformulation approaches can sometimes be effective, but often get trapped in low-quality solutions when followers objectives are non-linear or non-quadratic. Moreover, these approaches assume a unique equilibrium or a specific equilibrium concept, e.g., optimistic or pessimistic, which is a limiting assumption in many situations. To overcome these limitations, we propose a stochastic gradient descent--based approach, where the leaders strategy is updated by differentiating through the followers best responses. We frame the leaders optimization as a learning problem against followers equilibrium, which allows us to decouple the followers equilibrium constraints from the leaders problem. This approach also addresses cases with multiple equilibria and arbitrary equilibrium selection procedures by back-propagating through a sampled Nash equilibrium. To this end, this paper introduces a novel concept called equilibrium flow to formally characterize the set of equilibrium selection processes where the gradient with respect to a sampled equilibrium is an unbiased estimate of the true gradient. We evaluate our approach experimentally against existing baselines in three Stackelberg problems with multiple followers and find that in each case, our approach is able to achieve higher utility for the leader.

علوم الكمبيوتر ونظرية الألعاب

Thermal conductivity of crystalline AlN and the influence of atomic-scale defects

100 - Runjie Lily Xu , Miguel Munoz Rojo , S. M. Islam 2019

Aluminum nitride (AlN) plays a key role in modern power electronics and deep-ultraviolet photonics, where an understanding of its thermal properties is essential. Here we measure the thermal conductivity of crystalline AlN by the 3${omega}$ method, f inding it ranges from 674 ${pm}$ 56 W/m/K at 100 K to 186 ${pm}$ 7 W/m/K at 400 K, with a value of 237 ${pm}$ 6 W/m/K at room temperature. We compare these data with analytical models and first principles calculations, taking into account atomic-scale defects (O, Si, C impurities, and Al vacancies). We find Al vacancies play the greatest role in reducing thermal conductivity because of the largest mass-difference scattering. Modeling also reveals that 10% of heat conduction is contributed by phonons with long mean free paths, over ~7 ${mu}$m at room temperature, and 50% by phonons with MFPs over ~0.3 ${mu}$m. Consequently, the effective thermal conductivity of AlN is strongly reduced in sub-micron thin films or devices due to phonon-boundary scattering.

علم المواد الفيزياء ميسكالي وننكالي

Stay Ahead of Poachers: Illegal Wildlife Poaching Prediction and Patrol Planning Under Uncertainty with Field Test Evaluations

68 - Lily Xu , Shahrzad Gholami , Sara Mc Carthy 2019

Illegal wildlife poaching threatens ecosystems and drives endangered species toward extinction. However, efforts for wildlife protection are constrained by the limited resources of law enforcement agencies. To help combat poaching, the Protection Ass istant for Wildlife Security (PAWS) is a machine learning pipeline that has been developed as a data-driven approach to identify areas at high risk of poaching throughout protected areas and compute optimal patrol routes. In this paper, we take an end-to-end approach to the data-to-deployment pipeline for anti-poaching. In doing so, we address challenges including extreme class imbalance (up to 1:200), bias, and uncertainty in wildlife poaching data to enhance PAWS, and we apply our methodology to three national parks with diverse characteristics. (i) We use Gaussian processes to quantify predictive uncertainty, which we exploit to improve robustness of our prescribed patrols and increase detection of snares by an average of 30%. We evaluate our approach on real-world historical poaching data from Murchison Falls and Queen Elizabeth National Parks in Uganda and, for the first time, Srepok Wildlife Sanctuary in Cambodia. (ii) We present the results of large-scale field tests conducted in Murchison Falls and Srepok Wildlife Sanctuary which confirm that the predictive power of PAWS extends promisingly to multiple parks. This paper is part of an effort to expand PAWS to 800 parks around the world through integration with SMART conservation software.

تطبيقات الإحصاء الذكاء الاصطناعي التعلم الآلي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد