أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل He Chen

A Just-In-Time Networking Framework for Minimizing Request-Response Latency of Wireless Time-Sensitive Applications

88 - Lihao Zhang , Soung Chang Liew , He Chen 2021

This paper puts forth a networking paradigm, referred to as just-in-time (JIT) communication, to support client-server applications with stringent request-response latency requirement. Of interest is not just the round-trip delay of the network, but the actual request-response latency experienced by the application. The JIT framework contains two salient features. At the client side, the communication layer will pull a request from the client just when there is an upcoming transmission opportunity from the network. This ensures that the request contains information that is as fresh as possible (e.g., a sensor reading obtained just before the transmission opportunity). At the server side, the network ascertains that the server, after receiving and processing the request to generate a response (e.g., a control command to be sent to the client), will have a transmission opportunity at just this time. We realize the JIT system, including the protocol stack, over a Time-Division-Multiple-Access (TDMA) network implemented on a System-on-Chip (SoC) platform. We prove that a TDMA network with a power-of-2 time slots per superframe is optimal for realizing the server-side JIT function. Our experimental results validate that JIT networks can yield significantly lower request-response latency than networks without JIT support can.

بنية الشبكات والإنترنت

Drastically enhanced cation incorporation in the epitaxy of oxides due to formation and evaporation of suboxides from elemental sources

275 - Georg Hoffmann , Zongzhe Cheng , Oliver Brandt 2021

In the molecular beam epitaxy of oxide films, the cation (Sn, Ga) or dopant (Sn) incorporation does not follow the vapor pressure of the elemental metal sources, but is enhanced by several orders of magnitude for low source temperatures. Using line-o f-sight quadrupole mass spectrometry, we identify the dominant contribution to the total flux emanating from Sn and Ga sources at these temperatures to be due to the unintentional formation and evaporation of the respective suboxides SnO and Ga$_{2}$O. We quantitatively describe this phenomenon by a rate-equation model that takes into account the O background pressure, the resulting formation of the suboxides via oxidation of the metal source, and their subsequent thermally activated evaporation. As a result, the total flux composed of the metal and the suboxide fluxes exhibit an textsf{S}-shape temperature dependence instead of the expected linear one in an Arrhenius plot, in excellent agreement with the available experimental data. Our model reveals that the thermally activated regimes at low and high temperatures are almost exclusively due to suboxide and metal evaporation, respectively, joined by an intermediate plateau-like regime in which the flux is limited by the available amount of O. An important suboxide contribution is expected for all elemental sources whose suboxide exhibits a higher vapor pressure than the element, such as B, Ga, In, La, Si, Ge, Sn, Sb, Mo, Nb, Ru, Ta, V, and W. This contribution can play a decisive role in the molecular beam epitaxy of oxides, including multicomponent or complex oxides, from elemental sources. Finally, our model predicts suboxide-dominated growth in low-pressure chemical vapor deposition of Ga$_{2}$O$_{3}$ and In$_{2}$O$_{3}$.

علم المواد الفيزياء التطبيقية الفيزياء الكيميائية

Joint LED Selection and Precoding Optimization for Multiple-User Multiple-Cell VLC Systems

92 - Yang Yang , Yujie Yang , Mingzhe Chen 2021

This paper proposes a hybrid dimming scheme based on joint LED selection and precoding design (TASP-HD) for multiple-user (MU) multiple-cell (MC) visible light communications (VLC) systems. In TASP-HD, both the LED selection and the precoding of each cell can be dynamically adjusted to reduce the intra- and inter-cell interferences while satisfying illumination constraints. First, a MU-MC-VLC system model is established, and then a sum-rate maximization problem under dimming level and illumination uniformity constraints is formulated. In this studied problem, the indices of activated LEDs and precoding matrices are optimized, which result in a complex non-convex mixed integer problem. To solve this problem, the original problem is separated into two subproblems. The first subproblem, which maximizes the sum-rate of users via optimizing the LED selection with a given precoding matrix, is a mixed integer problem solved by the penalty method. With the optimized LED selection matrix, the second subproblem which focuses on the maximization of the sum-rate via optimizing the precoding matrix is solved by the Lagrangian dual method. Finally, these two subproblems are iteratively solved to obtain a convergent solution. Simulation results verify that in a typical indoor scenario under a dimming level of 70%, the mean bandwidth efficiency of TASP-HD is 4.8 bit/s/Hz and 7.13 bit/s/Hz greater than AD and DD, respectively.

أنظمة وتحكم أنظمة وتحكم

Machine learning-based aerosol characterization using OCO-2 O2 A-band observations

323 - Sihe Chen , Vijay Natraj , Zhao-Cheng Zeng 2021

Aerosol scattering influences the retrieval of the column-averaged dry-air mole fraction of CO2 (XCO2) from the Orbiting Carbon Observatory-2 (OCO-2). This is especially true for surfaces with reflectance close to a critical value where there is very low sensitivity to aerosol loading. A spectral sorting approach was introduced to improve the characterization of aerosols over coastal regions. Here, we generalize this procedure to land surfaces and use a two-step neural network to retrieve aerosol parameters from OCO-2 measurements. We show that, by using a combination of radiance measurements in the continuum and inside the absorption band, both the aerosol optical depth and layer height, as well as their uncertainties, can be accurately predicted. Using the improved aerosol estimates as a priori, we demonstrate that the accuracy of the XCO2 retrieval can be significantly improved compared to the OCO-2 Level-2 Standard product. Furthermore, using simulated observations, we obtain estimates of the error in the retrieved XCO2. These simulations indicate that the bias-corrected OCO-2 Lite data, which is used for flux

الفيزياء الجوية والمحيطية

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training

90 - Yuqing Song , Shizhe Chen , Qin Jin 2021

Translating e-commercial product descriptions, a.k.a product-oriented machine translation (PMT), is essential to serve e-shoppers all over the world. However, due to the domain specialty, the PMT task is more challenging than traditional machine tran slation problems. Firstly, there are many specialized jargons in the product description, which are ambiguous to translate without the product image. Secondly, product descriptions are related to the image in more complicated ways than standard image descriptions, involving various visual aspects such as objects, shapes, colors or even subjective styles. Moreover, existing PMT datasets are small in scale to support the research. In this paper, we first construct a large-scale bilingual product description dataset called Fashion-MMT, which contains over 114k noisy and 40k manually cleaned description translations with multiple product images. To effectively learn semantic alignments among product images and bilingual texts in translation, we design a unified product-oriented cross-modal cross-lingual model (upoc~) for pre-training and fine-tuning. Experiments on the Fashion-MMT and Multi30k datasets show that our model significantly outperforms the state-of-the-art models even pre-trained on the same dataset. It is also shown to benefit more from large-scale noisy data to improve the translation quality. We will release the dataset and codes at https://github.com/syuqings/Fashion-MMT.

الرؤية الحاسوبية وتمييز الأنماط

Temporal Induced Self-Play for Stochastic Bayesian Games

128 - Weizhe Chen , Zihan Zhou , Yi Wu 2021

One practical requirement in solving dynamic games is to ensure that the players play well from any decision point onward. To satisfy this requirement, existing efforts focus on equilibrium refinement, but the scalability and applicability of existin g techniques are limited. In this paper, we propose Temporal-Induced Self-Play (TISP), a novel reinforcement learning-based framework to find strategies with decent performances from any decision point onward. TISP uses belief-space representation, backward induction, policy learning, and non-parametric approximation. Building upon TISP, we design a policy-gradient-based algorithm TISP-PG. We prove that TISP-based algorithms can find approximate Perfect Bayesian Equilibrium in zero-sum one-sided stochastic Bayesian games with finite horizon. We test TISP-based algorithms in various games, including finitely repeated security games and a grid-world game. The results show that TISP-PG is more scalable than existing mathematical programming-based methods and significantly outperforms other learning-based methods.

أنظمة متعددة العملاء علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

Airbert: In-domain Pretraining for Vision-and-Language Navigation

105 - Pierre-Louis Guhur , Makarand Tapaswi , Shizhe Chen 2021

Vision-and-language navigation (VLN) aims to enable embodied agents to navigate in realistic environments using natural language instructions. Given the scarcity of domain-specific training data and the high diversity of image and language inputs, th e generalization of VLN agents to unseen environments remains challenging. Recent methods explore pretraining to improve generalization, however, the use of generic image-caption datasets or existing small-scale VLN environments is suboptimal and results in limited improvements. In this work, we introduce BnB, a large-scale and diverse in-domain VLN dataset. We first collect image-caption (IC) pairs from hundreds of thousands of listings from online rental marketplaces. Using IC pairs we next propose automatic strategies to generate millions of VLN path-instruction (PI) pairs. We further propose a shuffling loss that improves the learning of temporal order inside PI pairs. We use BnB pretrain our Airbert model that can be adapted to discriminative and generative settings and show that it outperforms state of the art for Room-to-Room (R2R) navigation and Remote Referring Expression (REVERIE) benchmarks. Moreover, our in-domain pretraining significantly increases performance on a challenging few-shot VLN evaluation, where we train the model only on VLN instructions from a few houses.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي الحساب واللغة

Question-controlled Text-aware Image Captioning

208 - Anwen Hu , Shizhe Chen , Qin Jin 2021

For an image with multiple scene texts, different people may be interested in different text information. Current text-aware image captioning models are not able to generate distinctive captions according to various information needs. To explore how to generate personalized text-aware captions, we define a new challenging task, namely Question-controlled Text-aware Image Captioning (Qc-TextCap). With questions as control signals, this task requires models to understand questions, find related scene texts and describe them together with objects fluently in human language. Based on two existing text-aware captioning datasets, we automatically construct two datasets, ControlTextCaps and ControlVizWiz to support the task. We propose a novel Geometry and Question Aware Model (GQAM). GQAM first applies a Geometry-informed Visual Encoder to fuse region-level object features and region-level scene text features with considering spatial relationships. Then, we design a Question-guided Encoder to select the most relevant visual features for each question. Finally, GQAM generates a personalized text-aware caption with a Multimodal Decoder. Our model achieves better captioning performance and question answering ability than carefully designed baselines on both two datasets. With questions as control signals, our model generates more informative and diverse captions than the state-of-the-art text-aware captioning model. Our code and datasets are publicly available at https://github.com/HAWLYQ/Qc-TextCap.

الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة

ICECAP: Information Concentrated Entity-aware Image Captioning

182 - Anwen Hu , Shizhe Chen , Qin Jin 2021

Most current image captioning systems focus on describing general image content, and lack background knowledge to deeply understand the image, such as exact named entities or concrete events. In this work, we focus on the entity-aware news image capt ioning task which aims to generate informative captions by leveraging the associated news articles to provide background knowledge about the target image. However, due to the length of news articles, previous works only employ news articles at the coarse article or sentence level, which are not fine-grained enough to refine relevant events and choose named entities accurately. To overcome these limitations, we propose an Information Concentrated Entity-aware news image CAPtioning (ICECAP) model, which progressively concentrates on relevant textual information within the corresponding news article from the sentence level to the word level. Our model first creates coarse concentration on relevant sentences using a cross-modality retrieval model and then generates captions by further concentrating on relevant words within the sentences. Extensive experiments on both BreakingNews and GoodNews datasets demonstrate the effectiveness of our proposed method, which outperforms other state-of-the-arts. The code of ICECAP is publicly available at https://github.com/HAWLYQ/ICECAP.

الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة

Efficient Human Pose Estimation by Maximizing Fusion and High-Level Spatial Attention

101 - Zhiyuan Ren , Yaohai Zhou , Yizhe Chen 2021

In this paper, we propose an efficient human pose estimation network -- SFM (slender fusion model) by fusing multi-level features and adding lightweight attention blocks -- HSA (High-Level Spatial Attention). Many existing methods on efficient networ k have already taken feature fusion into consideration, which largely boosts the performance. However, its performance is far inferior to large network such as ResNet and HRNet due to its limited fusion operation in the network. Specifically, we expand the number of fusion operation by building bridges between two pyramid frameworks without adding layers. Meanwhile, to capture long-range dependency, we propose a lightweight attention block -- HSA, which computes second-order attention map. In summary, SFM maximizes the number of feature fusion in a limited number of layers. HSA learns high precise spatial information by computing the attention of spatial attention map. With the help of SFM and HSA, our network is able to generate multi-level feature and extract precise global spatial information with little computing resource. Thus, our method achieve comparable or even better accuracy with less parameters and computational cost. Our SFM achieve 89.0 in [email protected], 42.0 in [email protected] on MPII validation set and 71.7 in AP, 90.7 in [email protected] on COCO validation with only 1.7G FLOPs and 1.5M parameters. The source code will be public soon.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد