أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Ruihan Yang

Supervised Compression for Resource-constrained Edge Computing Systems

298 - Yoshitomo Matsubara , Ruihan Yang , Marco Levorato 2021

There has been much interest in deploying deep learning algorithms on low-powered devices, including smartphones, drones, and medical sensors. However, full-scale deep neural networks are often too resource-intensive in terms of energy and storage. A s a result, the bulk part of the machine learning operation is therefore often carried out on an edge server, where the data is compressed and transmitted. However, compressing data (such as images) leads to transmitting information irrelevant to the supervised task. Another popular approach is to split the deep network between the device and the server while compressing intermediate features. To date, however, such split computing strategies have barely outperformed the aforementioned naive data compression baselines due to their inefficient approaches to feature compression. This paper adopts ideas from knowledge distillation and neural image compression to compress intermediate feature representations more efficiently. Our supervised compression approach uses a teacher model and a student model with a stochastic bottleneck and learnable prior for entropy coding. We compare our approach to various neural image and feature compression baselines in three vision tasks and found that it achieves better supervised rate-distortion performance while also maintaining smaller end-to-end latency. We furthermore show that the learned feature representations can be tuned to serve multiple downstream tasks.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Insights from Generative Modeling for Neural Video Compression

240 - Ruihan Yang , Yibo Yang , Joseph Marino 2021

While recent machine learning research has revealed connections between deep generative models such as VAEs and rate-distortion losses used in learned compression, most of this work has focused on images. In a similar spirit, we view recently propose d neural video coding algorithms through the lens of deep autoregressive and latent variable modeling. We present recent neural video codecs as instances of a generalized stochastic temporal autoregressive transform, and propose new avenues for further improvements inspired by normalizing flows and structured priors. We propose several architectures that yield state-of-the-art video compression performance on full-resolution video and discuss their tradeoffs and ablations. In particular, we propose (i) improved temporal autoregressive transforms, (ii) improved entropy models with structured and temporal dependencies, and (iii) variable bitra

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers

546 - Ruihan Yang , Minghao Zhang , Nicklas Hansen 2021

We propose to address quadrupedal locomotion tasks using Reinforcement Learning (RL) with a Transformer-based model that learns to combine proprioceptive information and high-dimensional depth sensor inputs. While learning-based locomotion has made g reat advances using RL, most methods still rely on domain randomization for training blind agents that generalize to challenging terrains. Our key insight is that proprioceptive states only offer contact measurements for immediate reaction, whereas an agent equipped with visual sensory observations can learn to proactively maneuver environments with obstacles and uneven terrain by anticipating changes in the environment many steps ahead. In this paper, we introduce LocoTransformer, an end-to-end RL method for quadrupedal locomotion that leverages a Transformer-based model for fusing proprioceptive states and visual observations. We evaluate our method in challenging simulated environments with different obstacles and uneven terrain. We show that our method obtains significant improvements over policies with only proprioceptive state inputs, and that Transformer-based models further improve generalization across environments. Our project page with videos is at https://RchalYang.github.io/LocoTransformer .

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

A large modulation of electron-phonon coupling and an emergent superconducting dome in doped strong ferroelectrics

166 - Jiaji Ma , Ruihan Yang , Hanghui Chen 2021

We use first-principles methods to study doped strong ferroelectrics (taking BaTiO$_3$ as a prototype). Here we find a strong coupling between itinerant electrons and soft polar phonons in doped BaTiO$_3$, contrary to Anderson/Blounts weakly coupled electron mechanism for ferroelectric-like metals. As a consequence, across a polar-to-centrosymmetric phase transition in doped BaTiO$_3$, the total electron-phonon coupling is increased to about 0.6 around the critical concentration, which is sufficient to induce phonon-mediated superconductivity of about 2 K. Lowering the crystal symmetry of doped BaTiO$_3$ by imposing epitaxial strain can further increase the superconducting temperature via a sizable coupling between itinerant electrons and acoustic phonons. Our work demonstrates a viable approach to modulating electron-phonon coupling and inducing phonon-mediated superconductivity in doped strong ferroelectrics and potentially in polar metals. Our results also show that the weakly coupled electron mechanism for ferroelectric-like metals is not necessarily present in doped strong ferroelectrics.

علم المواد المنصة الفائقة

Hierarchical Autoregressive Modeling for Neural Video Compression

111 - Ruihan Yang , Yibo Yang , Joseph Marino 2020

Recent work by Marino et al. (2020) showed improved performance in sequential density estimation by combining masked autoregressive flows with hierarchical latent variable models. We draw a connection between such autoregressive generative models and the task of lossy video compression. Specifically, we view recent neural video compression methods (Lu et al., 2019; Yang et al., 2020b; Agustssonet al., 2020) as instances of a generalized stochastic temporal autoregressive transform, and propose avenues for enhancement based on this insight. Comprehensive evaluations on large-scale video data show improved rate-distortion performance over both state-of-the-art neural and conventional video compression methods.

معالجة الصور والفيديو التعلم الآلي

Multi-Task Reinforcement Learning with Soft Modularization

477 - Ruihan Yang , Huazhe Xu , Yi Wu 2020

Multi-task learning is a very challenging problem in reinforcement learning. While training multiple tasks jointly allow the policies to share parameters across different tasks, the optimization problem becomes non-trivial: It remains unclear what pa rameters in the network should be reused across tasks, and how the gradients from different tasks may interfere with each other. Thus, instead of naively sharing parameters across tasks, we introduce an explicit modularization technique on policy representation to alleviate this optimization issue. Given a base policy network, we design a routing network which estimates different routing strategies to reconfigure the base network for each task. Instead of directly selecting routes for each task, our task-specific policy uses a method called soft modularization to softly combine all the possible routes, which makes it suitable for sequential tasks. We experiment with various robotics manipulation tasks in simulation and show our method improves both sample efficiency and performance over strong baselines by a large margin.

التعلم الآلي الذكاء الاصطناعي علم الروبوتات

The complex non-collinear magnetic orderings in Ba2YOsO6: A new approach to tuning spin-lattice interactions and controlling magnetic orderings in frustrated complex oxides

82 - Yue-Wen Fang , Ruihan Yang , Hanghui Chen 2019

Frustrated magnets are one class of fascinating materials that host many intriguing phases such as spin ice, spin liquid and complex long-range magnetic orderings at low temperatures. In this work we use first-principles calculations to find that in a wide range of magnetically frustrated oxides, at zero temperature a number of non-collinear magnetic orderings are more stable than the type-I collinear ordering that is observed at finite temperatures. The emergence of non-collinear orderings in those complex oxides is due to higher-order exchange interactions that originate from second-row and third-row transition metal elements. This implies a collinear-to-noncollinear spin transition at sufficiently low temperatures in those frustrated complex oxides. Furthermore, we find that in a particular oxide Ba$_2$YOsO$_6$, experimentally feasible uniaxial strain can tune the material between two different non-collinear magnetic orderings. Our work predicts new non-collinear magnetic orderings in frustrated complex oxides at very low temperatures and provides a mechanical route to tuning complex non-collinear magnetic orderings in those materials.

علم المواد الإلكترونات المرتبطة بشدة الفيزياء الحسابية

Deep Music Analogy Via Latent Representation Disentanglement

234 - Ruihan Yang , Dingsu Wang , Ziyu Wang 2019

Analogy-making is a key method for computer algorithms to generate both natural and creative music pieces. In general, an analogy is made by partially transferring the music abstractions, i.e., high-level representations and their relationships, from one piece to another; however, this procedure requires disentangling music representations, which usually takes little effort for musicians but is non-trivial for computers. Three sub-problems arise: extracting latent representations from the observation, disentangling the representations so that each part has a unique semantic interpretation, and mapping the latent representations back to actual music. In this paper, we contribute an explicitly-constrained variational autoencoder (EC$^2$-VAE) as a unified solution to all three sub-problems. We focus on disentangling the pitch and rhythm representations of 8-beat music clips conditioned on chords. In producing music analogies, this model helps us to realize the imaginary situation of what if a piece is composed using a different pitch contour, rhythm pattern, or chord progression by borrowing the representations from other pieces. Finally, we validate the proposed disentanglement method using objective measurements and evaluate the analogy examples by a subjective study.

أنظمة الصوت في الحاسوب استرجاع المعلومات التعلم الآلي

Learning Efficient and Effective Exploration Policies with Counterfactual Meta Policy

76 - Ruihan Yang , Qiwei Ye , Tie-Yan Liu 2019

A fundamental issue in reinforcement learning algorithms is the balance between exploration of the environment and exploitation of information already obtained by the agent. Especially, exploration has played a critical role for both efficiency and e fficacy of the learning process. However, Existing works for exploration involve task-agnostic design, that is performing well in one environment, but be ill-suited to another. To the purpose of learning an effective and efficient exploration policy in an automated manner. We formalized a feasible metric for measuring the utility of exploration based on counterfactual ideology. Based on that, We proposed an end-to-end algorithm to learn exploration policy by meta-learning. We demonstrate that our method achieves good results compared to previous works in the high-dimensional control tasks in MuJoCo simulator.

التعلم الآلي التعلم الالي

Inspecting and Interacting with Meaningful Music Representations using VAE

57 - Ruihan Yang , Tianyao Chen , Yiyi Zhang 2019

Variational Autoencoders(VAEs) have already achieved great results on image generation and recently made promising progress on music generation. However, the generation process is still quite difficult to control in the sense that the learned latent representations lack meaningful music semantics. It would be much more useful if people can modify certain music features, such as rhythm and pitch contour, via latent representations to test different composition ideas. In this paper, we propose a new method to inspect the pitch and rhythm interpretations of the latent representations and we name it disentanglement by augmentation. Based on the interpretable representations, an intuitive graphical user interface is designed for users to better direct the music creation process by manipulating the pitch contours and rhythmic complexity.

أنظمة الصوت في الحاسوب تفاعل الإنسان والحاسوب استرجاع المعلومات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد