ترغب بنشر مسار تعليمي؟ اضغط هنا

The fragmentation process in massive star-forming regions is one of the contemporary problems in astrophysics, and several physical processes have been proposed to control the fragmentation including turbulence, magnetic field, rotation, stellar feed back, and gravity. However, the fragmentation process has been poorly studied at small spatial scales well below 1000 AU. We aim to use ALMA (Atacama Large Millimeter and Submillimeter Array) high angular resolution data to identify the fragments in W51 IRS2 and to study the fragmentation properties on a spatial scale of 200 AU. We used ALMA data of W51 IRS2 from three projects, which give an angular resolution of 0.028$^{primeprime}$ (144 AU) at millimeter wavelengths. We identified compact fragments by using {it uv}-range constrained 1.3 mm continuum data. A Mean Surface Density of Companions (MSDC) analysis has been performed to study the separations between fragments. A total number of 33 continuum sources are identified and 29 out of them are defined as fragments in the surveyed region.The MSDC analysis reveals two breaks corresponding to spatial sales of 1845 AU and 7346 AU, indicative of a two-level clustering phenomenon, along with a linear regime below 1845 AU, mostly associated with W51 North, whose slope is consistent with the slope for the clustering regime of other cluster-like regions in the Galaxy. The typical masses and separations of the fragments as well as the relation between density and number of fragments can be explained through a thermal Jeans process operating at high temperatures of 200--400 K, consistent with previous measurements of the temperature in the region, and produced by the nearby massive stars. Therefore, although W51 IRS2 seems to be undergoing a thermally inhibited fragmentation phase, this does not seem to prevent the formation of a protocluster associated with W51 North.
Discrete-continuous hybrid action space is a natural setting in many practical problems, such as robot control and game AI. However, most previous Reinforcement Learning (RL) works only demonstrate the success in controlling with either discrete or c ontinuous action space, while seldom take into account the hybrid action space. One naive way to address hybrid action RL is to convert the hybrid action space into a unified homogeneous action space by discretization or continualization, so that conventional RL algorithms can be applied. However, this ignores the underlying structure of hybrid action space and also induces the scalability issue and additional approximation difficulties, thus leading to degenerated results. In this paper, we propose Hybrid Action Representation (HyAR) to learn a compact and decodable latent representation space for the original hybrid action space. HyAR constructs the latent space and embeds the dependence between discrete action and continuous parameter via an embedding table and conditional Variantional Auto-Encoder (VAE). To further improve the effectiveness, the action representation is trained to be semantically smooth through unsupervised environmental dynamics prediction. Finally, the agent then learns its policy with conventional DRL algorithms in the learned representation space and interacts with the environment by decoding the hybrid action embeddings to the original action space. We evaluate HyAR in a variety of environments with discrete-continuous action space. The results demonstrate the superiority of HyAR when compared with previous baselines, especially for high-dimensional action spaces.
65 - Jiyao Tang 2021
We prove that the torsion points of an abelian variety are equidistributed over the corresponding berkovich space with respect to the canonical measure.
We performed systematic angle-resolved photoemission spectroscopy measurements $in$-$situ$ on $T$-${rm La}_{2-x}{rm Ce}_xrm {CuO}_{4pmdelta}$ (LCCO) thin films over the extended doping range prepared by the refined ozone/vacuum annealing method. Elec tron doping level ($n$), estimated from the measured Fermi surface volume, varies from 0.05 to 0.23, which covers the whole superconducting dome. We observed an absence of the insulating behavior around $n sim$ 0.05 and the Fermi surface reconstruction shifted to $n sim$ 0.11 in LCCO compared to that of other electron-doped cuprates at around 0.15, suggesting that antiferromagnetism is strongly suppressed in this material. The possible explanation may lie in the enhanced -$t$ /$t$ in LCCO for the largest $rm{La^{3+}}$ ionic radius among all the Lanthanide elements.
Deep reinforcement learning (DRL) algorithms have been demonstrated to be effective in a wide range of challenging decision making and control tasks. However, these methods typically suffer from severe action oscillations in particular in discrete ac tion setting, which means that agents select different actions within consecutive steps even though states only slightly differ. This issue is often neglected since the policy is usually evaluated by its cumulative rewards only. Action oscillation strongly affects the user experience and can even cause serious potential security menace especially in real-world domains with the main concern of safety, such as autonomous driving. To this end, we introduce Policy Inertia Controller (PIC) which serves as a generic plug-in framework to off-the-shelf DRL algorithms, to enables adaptive trade-off between the optimality and smoothness of the learned policy in a formal way. We propose Nested Policy Iteration as a general training algorithm for PIC-augmented policy which ensures monotonically non-decreasing updates under some mild conditions. Further, we derive a practical DRL algorithm, namely Nested Soft Actor-Critic. Experiments on a collection of autonomous driving tasks and several Atari games suggest that our approach demonstrates substantial oscillation reduction in comparison to a range of commonly adopted baselines with almost no performance degradation.
Value function is the central notion of Reinforcement Learning (RL). Value estimation, especially with function approximation, can be challenging since it involves the stochasticity of environmental dynamics and reward signals that can be sparse and delayed in some cases. A typical model-free RL algorithm usually estimates the values of a policy by Temporal Difference (TD) or Monte Carlo (MC) algorithms directly from rewards, without explicitly taking dynamics into consideration. In this paper, we propose Value Decomposition with Future Prediction (VDFP), providing an explicit two-step understanding of the value estimation process: 1) first foresee the latent future, 2) and then evaluate it. We analytically decompose the value function into a latent future dynamics part and a policy-independent trajectory return part, inducing a way to model latent dynamics and returns separately in value estimation. Further, we derive a practical deep RL algorithm, consisting of a convolutional model to learn compact trajectory representation from past experiences, a conditional variational auto-encoder to predict the latent future dynamics and a convex return model that evaluates trajectory representation. In experiments, we empirically demonstrate the effectiveness of our approach for both off-policy and on-policy RL in several OpenAI Gym continuous control tasks as well as a few challenging variants with delayed reward.
109 - Yao Tang 2020
In the era of noisy intermediate-scale quantum (NISQ), executing quantum algorithms on actual quantum devices faces unique challenges. One such challenge is that quantum devices in this era have restricted connectivity: quantum gates are allowed to a ct only on specific pairs of physical qubits. For this reason, a quantum circuit needs to go through a compiling process called qubit routing before it can be executed on a quantum computer. In this study, we propose a CNOT synthesis method called the token reduction method to solve this problem. The token reduction method works for all quantum computers whose architecture is represented by connected graphs. A major difference between our method and the existing ones is that our method synthesizes a circuit to an output qubit mapping that might be different from the input qubit mapping. The final mapping for the synthesis is determined dynamically during the synthesis process. Results showed that our algorithm consistently outperforms the best publicly accessible algorithm for all of the tested quantum architectures.
Thirty massive clumps associated with bright infrared sources were observed to detect the infall signatures and characterize infall properties in the envelope of the massive clumps by APEX telescope in CO(4-3) and C$^{17}$O(3-2) lines. Eighteen objec ts have blue profile in CO(4-3) line with virial parameters less than 2, suggesting that global collapse is taking place in these massive clumps. The CO(4-3) lines were fitted by the two-layer model in order to obtain infall velocities and mass infall rates. Derived mass infall rates are from 10$^{-3}$ to 10$^{-1}$ M$_{odot}$yr$^{-1}$. A positive relationship between clump mass and infall rate appears to indicate that gravity plays a dominant role in the collapsing process. Higher luminosity clump has larger mass infall rate, implying that the clump with higher mass infall rate has higher star formation rate.
We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation. Suc h an extension enables PeVFA to preserve values of multiple policies at the same time and brings an appealing characteristic, i.e., emph{value generalization among policies}. We formally analyze the value generalization under Generalized Policy Iteration (GPI). From theoretical and empirical lens, we show that generalized value estimates offered by PeVFA may have lower initial approximation error to true values of successive policies, which is expected to improve consecutive value approximation during GPI. Based on above clues, we introduce a new form of GPI with PeVFA which leverages the value generalization along policy improvement path. Moreover, we propose a representation learning framework for RL policy, providing several approaches to learn effective policy embeddings from policy network parameters or state-action pairs. In our experiments, we evaluate the efficacy of value generalization offered by PeVFA and policy representation learning in several OpenAI Gym continuous control tasks. For a representative instance of algorithm implementation, Proximal Policy Optimization (PPO) re-implemented under the paradigm of GPI with PeVFA achieves about 40% performance improvement on its vanilla counterpart in most environments.
109 - Yao Tang , Man Hon Cheung , 2019
Unmanned aerial vehicles (UAVs) can enhance the performance of cellular networks, due to their high mobility and efficient deployment. In this paper, we present a first study on how the user mobility affects the UAVs trajectories of a multiple-UAV as sisted wireless communication system. Specifically, we consider the UAVs are deployed as aerial base stations to serve ground users who move between different regions. We maximize the throughput of ground users in the downlink communication by optimizing the UAVs trajectories, while taking into account the impact of the user mobility, propulsion energy consumption, and UAVs mutual interference. We formulate the problem as a route selection problem in an acyclic directed graph. Each vertex represents a task associated with a reward on the average user throughput in a region-time point, while each edge is associated with a cost on the energy propulsion consumption during flying and hovering. For the centralized trajectory design, we first propose the shortest path scheme that determines the optimal trajectory for the single UAV case. We also propose the centralized route selection (CRS) scheme to systematically compute the optimal trajectories for the more general multiple-UAV case. Due to the NP-hardness of the centralized problem, we consider the distributed trajectory design that each UAV selects its trajectory autonomously and propose the distributed route selection (DRS) scheme, which will converge to a pure strategy Nash equilibrium within a finite number of iterations.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا