ترغب بنشر مسار تعليمي؟ اضغط هنا

78 - Zhendong Li , Wen Chen 2021
In this paper, we investigate a more efficient transmissive reconfigurable meta-surface (RMS) transmitter, which is potential to realize the sixth-generation (6G) mobile communication ultra massive multiple input multiple output (MIMO) due to its low cost and low power consumption. Since RMS is passive, it can reduce power consumption while satisfying the high-capacity requirements of 6G networks. For the proposed architecture, we elaborate transmissive RMS transmitter architecture, channel model, channel estimation, downlink (DL) signal modulation, and beamforming design, etc.. Finally, several potential research directions in the future are given.
We give a pedagogical review of how concepts from quantum information theory build up the gravitational side of the AdS/CFT correspondence. The review is self-contained in that it only presupposes knowledge of quantum mechanics and general relativity ; other tools--including holographic duality itself--are introduced in the text. We have aimed to give researchers interested in entering this field a working knowledge sufficient for initiating original projects. The review begins with the laws of black hole thermodynamics, which form the basis of this subject, then introduces the Ryu-Takayanagi proposal, the JLMS relation, and subregion duality. We discuss tensor networks as a visualization tool and analyze various network architectures in detail. Next, several modern concepts and techniques are discussed: Renyi entropies and the replica trick, differential entropy and kinematic space, modular Berry phases, modular minimal entropy, entanglement wedge cross sections, bit threads, and others. We discuss the extent to which bulk geometries are fixed by boundary entanglement entropies, and analyze the relations such as the monogamy of mutual information, which boundary entanglement entropies must obey if a state has a semiclassical bulk dual. We close with a discussion of black holes, including holographic complexity, firewalls and the black hole information paradox, islands, and replica wormholes.
With the rise and development of deep learning over the past decade, there has been a steady momentum of innovation and breakthroughs that convincingly push the state-of-the-art of cross-modal analytics between vision and language in multimedia field . Nevertheless, there has not been an open-source codebase in support of training and deploying numerous neural network models for cross-modal analytics in a unified and modular fashion. In this work, we propose X-modaler -- a versatile and high-performance codebase that encapsulates the state-of-the-art cross-modal analytics into several general-purpose stages (e.g., pre-processing, encoder, cross-modal interaction, decoder, and decode strategy). Each stage is empowered with the functionality that covers a series of modules widely adopted in state-of-the-arts and allows seamless switching in between. This way naturally enables a flexible implementation of state-of-the-art algorithms for image captioning, video captioning, and vision-language pre-training, aiming to facilitate the rapid development of research community. Meanwhile, since the effective modular designs in several stages (e.g., cross-modal interaction) are shared across different vision-language tasks, X-modaler can be simply extended to power startup prototypes for other tasks in cross-modal analytics, including visual question answering, visual commonsense reasoning, and cross-modal retrieval. X-modaler is an Apache-licensed codebase, and its source codes, sample projects and pre-trained models are available on-line: https://github.com/YehLi/xmodaler.
An intelligent reflecting surface (IRS)-aided wireless powered mobile edge computing (WP-MEC) system is conceived, where each devices computational task can be divided into two parts for local computing and offloading to mobile edge computing (MEC) s ervers, respectively. Both time division multiple access (TDMA) and non-orthogonal multiple access (NOMA) schemes are considered for uplink (UL) offloading. Given the capability of IRSs in intelligently reconfiguring wireless channels over time, it is fundamentally unknown which multiple access scheme is superior for MEC UL offloading. To answer this question, we first investigate the impact of three different dynamic IRS beamforming (DIBF) schemes on the computation rate of both offloading schemes, based on the flexibility for the IRS in adjusting its beamforming (BF) vector in each transmission frame. Under the DIBF framework, computation rate maximization problems are formulated for both the NOMA and TDMA schemes, respectively, by jointly optimizing the IRS passive BF and the resource allocation. We rigorously prove that offloading adopting TDMA can achieve the same computation rate as that of NOMA, when all the devices share the same IRS BF vector during the UL offloading. By contrast, offloading exploiting TDMA outperforms NOMA, when the IRS BF vector can be flexibly adapted for UL offloading. Despite the non-convexity of the computation rate maximization problems for each DIBF scheme associated with highly coupled optimization variables, we conceive computationally efficient algorithms by invoking alternating optimization. Our numerical results demonstrate the significant performance gains achieved by the proposed designs over various benchmark schemes.
96 - Shuowen Chen , Yang Ming 2021
What causes countercyclicality of industry--level productivity dispersion in the U.S.? Empirically, we construct an index of negative profit shocks and show that both productivity dispersion and R&D intensity dispersion enlarge at the onset of the sh ock and gradually dissipate. Theoretically, we build a duopolistic technology--ladder model in which heterogeneous R&D costs determine firms post--shock optimal behaviors and equilibrium technology gap. Quantitatively, we calibrate a parameterized model, simulate firms post--shock responses and predict that productivity dispersion is due to the low--cost firm increasing R&D efforts and the high--cost firm doing the opposite. We provide two empirical tests for this mechanism.
Aiming at the limited battery capacity of a large number of widely deployed low-power smart devices in the Internet-of-things (IoT), this paper proposes a novel intelligent reflecting surface (IRS) empowered unmanned aerial vehicle (UAV) simultaneous wireless information and power transfer (SWIPT) network framework, in which IRS is used to reconstruct the wireless channel to enhance the energy transmission efficiency and coverage of the UAV SWIPT networks. In this paper, we formulate an achievable sum-rate maximization problem by jointly optimizing UAV trajectory, UAV transmission power allocation, power splitting (PS) ratio and IRS reflection coefficient under a non-linear energy harvesting model. Due to the coupling of optimization variables, this problem is a complex non-convex optimization problem, and it is challenging to solve it directly. We first transform the problem, and then apply the alternating optimization (AO) algorithm framework to divide the transformed problem into four blocks to solve it. Specifically, by applying successive convex approximation (SCA) and difference-convex (DC) programming, UAV trajectory, UAV transmission power allocation, PS ratio and IRS reflection coefficient are alternately optimized when the other three are given until convergence is achieved. Numerical simulation results verify the effectiveness of our proposed algorithm compared to other algorithms.
Constructing appropriate representations of molecules lies at the core of numerous tasks such as material science, chemistry and drug designs. Recent researches abstract molecules as attributed graphs and employ graph neural networks (GNN) for molecu lar representation learning, which have made remarkable achievements in molecular graph modeling. Albeit powerful, current models either are based on local aggregation operations and thus miss higher-order graph properties or focus on only node information without fully using the edge information. For this sake, we propose a Communicative Message Passing Transformer (CoMPT) neural network to improve the molecular graph representation by reinforcing message interactions between nodes and edges based on the Transformer architecture. Unlike the previous transformer-style GNNs that treat molecules as fully connected graphs, we introduce a message diffusion mechanism to leverage the graph connectivity inductive bias and reduce the message enrichment explosion. Extensive experiments demonstrated that the proposed model obtained superior performances (around 4$%$ on average) against state-of-the-art baselines on seven chemical property datasets (graph-level tasks) and two chemical shift datasets (node-level tasks). Further visualization studies also indicated a better representation capacity achieved by our model.
Evolution-based neural architecture search requires high computational resources, resulting in long search time. In this work, we propose a framework of applying the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to the neural architecture search problem called CMANAS, which achieves better results than previous evolution-based methods while reducing the search time significantly. The architectures are modelled using a normal distribution, which is updated using CMA-ES based on the fitness of the sampled population. We used the accuracy of a trained one shot model (OSM) on the validation data as a prediction of the fitness of an individual architecture to reduce the search time. We also used an architecture-fitness table (AF table) for keeping record of the already evaluated architecture, thus further reducing the search time. CMANAS finished the architecture search on CIFAR-10 with the top-1 test accuracy of 97.44% in 0.45 GPU day and on CIFAR-100 with the top-1 test accuracy of 83.24% for 0.6 GPU day on a single GPU. The top architectures from the searches on CIFAR-10 and CIFAR-100 were then transferred to ImageNet, achieving the top-5 accuracy of 92.6% and 92.1%, respectively.
Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solv e both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this observation, we propose MaskFormer, a simple mask classification model which predicts a set of binary masks, each associated with a single global class label prediction. Overall, the proposed mask classification-based method simplifies the landscape of effective approaches to semantic and panoptic segmentation tasks and shows excellent empirical results. In particular, we observe that MaskFormer outperforms per-pixel classification baselines when the number of classes is large. Our mask classification-based method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
We address the problem of text-guided video temporal grounding, which aims to identify the time interval of certain event based on a natural language description. Different from most existing methods that only consider RGB images as visual features, we propose a multi-modal framework to extract complementary information from videos. Specifically, we adopt RGB images for appearance, optical flow for motion, and depth maps for image structure. While RGB images provide abundant visual cues of certain event, the performance may be affected by background clutters. Therefore, we use optical flow to focus on large motion and depth maps to infer the scene configuration when the action is related to objects recognizable with their shapes. To integrate the three modalities more effectively and enable inter-modal learning, we design a dynamic fusion scheme with transformers to model the interactions between modalities. Furthermore, we apply intra-modal self-supervised learning to enhance feature representations across videos for each modality, which also facilitates multi-modal learning. We conduct extensive experiments on the Charades-STA and ActivityNet Captions datasets, and show that the proposed method performs favorably against state-of-the-art approaches.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا