أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Gang Li

Cross DQN: Cross Deep Q Network for Ads Allocation in Feed

69 - Guogang Liao , Ze Wang , Xiaoxu Wu 2021

E-commerce platforms usually display a mixed list of ads and organic items in feed. One key problem is to allocate the limited slots in the feed to maximize the overall revenue as well as improve user experience, which requires a good model for user preference. Instead of modeling the influence of individual items on user behaviors, the arrangement signal models the influence of the arrangement of items and may lead to a better allocation strategy. However, most of previous strategies fail to model such a signal and therefore result in suboptimal performance. To this end, we propose Cross Deep Q Network (Cross DQN) to extract the arrangement signal by crossing the embeddings of different items and processing the crossed sequence in the feed. Our model results in higher revenue and better user experience than state-of-the-art baselines in offline experiments. Moreover, our model demonstrates a significant improvement in the online A/B test and has been fully deployed on Meituan feed to serve more than 300 millions of customers.

التعلم الآلي

Revisit prompt $J/psi$ production in associated with Higgs Boson via gluon fusion at the LHC

122 - Xue-An Pan , Zhong-Ming Niu , Gang Li 2021

The production of charmonium associated with Higgs boson via gluon fusion has been investigated in Ref.[Phys.Rev.D66,114002(2002)], in which they considered the contribution of final Higgs boson radiation off the charm quark at tree level and found t hat this process is to be far too rare to be observable in any of the considered experiments. In this paper, the production of prompt $J/psi$ associated with Higgs boson via gluon fusion at the 14 TeV LHC within the factorization formalism of NRQCD is revisited. After considering the contribution from the final Higgs boson radiation off the top quark in the loop, which is {more than} three orders of magnitudes over the charm quark at tree level, the production of prompt $J/psi$ associated with Higgs boson has great potential to be detected. The prompt $J/psi$ production includes the direct production and indirect production via radiative or hadronic decays of high excited charmonium states. For the direct $J/psi + H$ production via gluon fusion loop-induced, the ${}^{3}S^{(8)}_1$ Fock state gives dominant contribution to the cross section, which is about 95% to the total direct production. The indirect contribution via loop-induced is appreciable, since the summation of which from $psi(2S) + H$, $chi_{c1} + H$ and $chi_{c2} + H$ is about $34%$ to the total cross section of prompt $J/psi + H$. While the indirect contribution from $chi_{c0} + H$ is tiny, which can be neglected. With the great potential to be detected, prompt $J/psi$ production in associated with Higgs boson can help us to further understand the mechanism of colour-octet, as well as can be useful to further investigate the coupling of the Higgs boson and fermion.

فيزياء الطاقة العالية - الظواهر

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

107 - Wei Niu , Zhengang Li , Xiaolong Ma 2021

It is appealing but challenging to achieve real-time deep neural network (DNN) inference on mobile devices because even the powerful modern mobile devices are considered as ``resource-constrained when executing large-scale DNNs. It necessitates the s parse model inference via weight pruning, i.e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy. This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) and that achieves Real-time execution and high accuracy, leveraging fine-grained structured sparse model Inference and compiler optimizations for Mobiles. We start by proposing a new fine-grained structured sparsity scheme through the Block-based Column-Row (BCR) pruning. Based on this new fine-grained structured sparsity, our GRIM framework consists of two parts: (a) the compiler optimization and code generation for real-time mobile inference; and (b) the BCR pruning optimizations for determining pruning hyperparameters and performing weight pruning. We compare GRIM with Alibaba MNN, TVM, TensorFlow-Lite, a sparse implementation based on CSR, PatDNN, and ESE (a representative FPGA inference acceleration framework for RNNs), and achieve up to 14.08x speedup.

التعلم الآلي الذكاء الاصطناعي

Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

366 - Bryan Wang , Gang Li , Xin Zhou 2021

Mobile User Interface Summarization generates succinct language descriptions of mobile screens for conveying important contents and functionalities of the screen, which can be useful for many language-based application scenarios. We present Screen2Wo rds, a novel screen summarization approach that automatically encapsulates essential information of a UI screen into a coherent language phrase. Summarizing mobile screens requires a holistic understanding of the multi-modal data of mobile UIs, including text, image, structures as well as UI semantics, motivating our multi-modal learning approach. We collected and analyzed a large-scale screen summarization dataset annotated by human workers. Our dataset contains more than 112k language summarization across $sim$22k unique UI screens. We then experimented with a set of deep models with different configurations. Our evaluation of these models with both automatic accuracy metrics and human rating shows that our approach can generate high-quality summaries for mobile screens. We demonstrate potential use cases of Screen2Words and open-source our dataset and model to lay the foundations for further bridging language and user interfaces.

تفاعل الإنسان والحاسوب الذكاء الاصطناعي التعلم الآلي

S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration

173 - Zhi-Gang Liu , Paul N. Whatmough , Yuhao Zhu 2021

Exploiting sparsity is a key technique in accelerating quantized convolutional neural network (CNN) inference on mobile devices. Prior sparse CNN accelerators largely exploit un-structured sparsity and achieve significant speedups. Due to the unbound ed, largely unpredictable sparsity patterns, however, exploiting unstructured sparsity requires complicated hardware design with significant energy and area overhead, which is particularly detrimental to mobile/IoT inference scenarios where energy and area efficiency are crucial. We propose to exploit structured sparsity, more specifically, Density Bound Block (DBB) sparsity for both weights and activations. DBB block tensors bound the maximum number of non-zeros per block. DBB thus exposes statically predictable sparsity patterns that enable lean sparsity-exploiting hardware. We propose new hardware primitives to implement DBB sparsity for (static) weights and (dynamic) activations, respectively, with very low overheads. Building on top of the primitives, we describe S2TA, a systolic array-based CNN accelerator that exploits joint weight and activation DBB sparsity and new dimensions of data reuse unavailable on the traditional systolic array. S2TA in 16nm achieves more than 2x speedup and energy reduction compared to a strong baseline of a systolic array with zero-value clock gating, over five popular CNN benchmarks. Compared to two recent non-systolic sparse accelerators, Eyeriss v2 (65nm) and SparTen (45nm), S2TA in 65nm uses about 2.2x and 3.1x less energy per inference, respectively.

هندسة العتاد التعلم الآلي

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines

86 - Shigang Li , Torsten Hoefler 2021

Training large deep learning models at scale is very challenging. This paper proposes Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for efficiently training large-scale models. Chimera is a synchronous approach a nd therefore no loss of accuracy, which is more convergence-friendly than asynchronous approaches. Compared with the latest synchronous pipeline approach, Chimera reduces the number of bubbles by up to 50%; benefiting from the sophisticated scheduling of bidirectional pipelines, Chimera has a more balanced activation memory consumption. Evaluations are conducted on Transformer based language models. For a GPT-2 model with 1.3 billion parameters running on 2,048 GPU nodes of the Piz Daint supercomputer, Chimera improves the training throughput by 1.16x-2.34x over the state-of-the-art synchronous and asynchronous pipeline approaches.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Higher order Brezis-Nirenberg problem on hyperbolic spaces: Existence, nonexistence and symmetry of solutions

171 - Jungang Li , Guozhen Lu , Qiaohua Yang 2021

The main purpose of this paper is to establish the existence, nonexistence and symmetry of nontrivial solutions to the higher order Brezis-Nirenberg problems associated with the GJMS operators $P_k$ on bounded domains in the hyperbolic space $mathbb{ H}^n$ and as well as on the entire hyperbolic space $mathbb{H}^n$. Among other techniques, one of our main novelties is to use crucially the Helgason-Fourier analysis on hyperbolic spaces and the higher order Hardy-Sobolev-Mazya inequalities and careful study of delicate properties of Greens functions of $P_k-lambda$ on hyperbolic spaces which are of independent interests in dealing with such problems. Such Greens functions allow us to obtain the integral representations of solutions and thus to avoid using the maximum principle to establish the symmetry of solutions.

تحليل PDES التحليل الكلاسيكي و ODEs

Real-time Dispatchable Region of Active Distribution Networks Based on a Tight Convex Relaxation Model

108 - Wenjing Huang , Zhigang Li , Mohammad Shahidehpour 2021

The uncertainty in distributed renewable generation poses security threats to the real-time operation of distribution systems. The real-time dispatchable region (RTDR) can be used to assess the ability of power systems to accommodate renewable genera tion at a given base point. DC and linearized AC power flow models are typically used for bulk power systems, but they are not suitable for low-voltage distribution networks with large r/x ratios. To balance accuracy and computational efficiency, this paper proposes an RTDR model of AC distribution networks using tight convex relaxation. Convex hull relaxation is adopted to reformulate the AC power flow equations, and the convex hull is approximated by a polyhedron without much loss of accuracy. Furthermore, an efficient adaptive constraint generation algorithm is employed to construct an approximate RTDR to meet the requirements of real-time dispatch. Case studies on the modified IEEE 33-bus distribution system validate the computational efficiency and accuracy of the proposed method.

أنظمة وتحكم أنظمة وتحكم

Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

125 - Yang Li , Si Si , Gang Li 2021

Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this paper, we p ropose a novel positional encoding method based on learnable Fourier features. Instead of hard-coding each position as a token or a vector, we represent each position, which can be multi-dimensional, as a trainable encoding based on learnable Fourier feature mapping, modulated with a multi-layer perceptron. The representation is particularly advantageous for a spatial multi-dimensional position, e.g., pixel positions on an image, where $L_2$ distances or more complex positional relationships need to be captured. Our experiments based on several public benchmark tasks show that our learnable Fourier feature representation for multi-dimensional positional encoding outperforms existing methods by both improving the accuracy and allowing faster convergence.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Counterfactual Graph Learning for Link Prediction

130 - Tong Zhao , Gang Liu , Daheng Wang 2021

Learning to predict missing links is important for many graph-based applications. Existing methods were designed to learn the observed association between two sets of variables: (1) the observed graph structure and (2) the existence of link between a pair of nodes. However, the causal relationship between these variables was ignored and we visit the possibility of learning it by simply asking a counterfactual question: would the link exist or not if the observed graph structure became different? To answer this question by causal inference, we consider the information of the node pair as context, global graph structural properties as treatment, and link existence as outcome. In this work, we propose a novel link prediction method that enhances graph learning by the counterfactual inference. It creates counterfactual links from the observed ones, and our method learns representations from both of them. Experiments on a number of benchmark datasets show that our proposed method achieves the state-of-the-art performance on link prediction.

التعلم الآلي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد