أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Xin Zhou

Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

366 - Bryan Wang , Gang Li , Xin Zhou 2021

Mobile User Interface Summarization generates succinct language descriptions of mobile screens for conveying important contents and functionalities of the screen, which can be useful for many language-based application scenarios. We present Screen2Wo rds, a novel screen summarization approach that automatically encapsulates essential information of a UI screen into a coherent language phrase. Summarizing mobile screens requires a holistic understanding of the multi-modal data of mobile UIs, including text, image, structures as well as UI semantics, motivating our multi-modal learning approach. We collected and analyzed a large-scale screen summarization dataset annotated by human workers. Our dataset contains more than 112k language summarization across $sim$22k unique UI screens. We then experimented with a set of deep models with different configurations. Our evaluation of these models with both automatic accuracy metrics and human rating shows that our approach can generate high-quality summaries for mobile screens. We demonstrate potential use cases of Screen2Words and open-source our dataset and model to lay the foundations for further bridging language and user interfaces.

تفاعل الإنسان والحاسوب الذكاء الاصطناعي التعلم الآلي

Prime-valent Symmetric graphs with a quasi-semiregular automorphism

178 - Fu-Gang Yin , Yan-Quan Feng , Jin-Xin Zhou 2021

An automorphism of a graph is called quasi-semiregular if it fixes a unique vertex of the graph and its remaining cycles have the same length. This kind of symmetry of graphs was first investigated by Kutnar, Malniv{c}, Mart{i}nez and Maruv{s}iv{c} i n 2013, as a generalization of the well-known semiregular automorphism of a graph. Symmetric graphs of valency three or four, admitting a quasi-semiregular automorphism, have been classified in recent two papers. Let $pgeq 5$ be a prime and $Gamma$ a connected symmetric graph of valency $p$ admitting a quasi-semiregular automorphism. In this paper, we first prove that either $Gamma$ is a connected Cayley graph $rm{Cay}(M,S)$ such that $M$ is a $2$-group admitting a fixed-point-free automorphism of order $p$ with $S$ as an orbit of involutions, or $Gamma$ is a normal $N$-cover of a $T$-arc-transitive graph of valency $p$ admitting a quasi-semiregular automorphism, where $T$ is a non-abelian simple group and $N$ is a nilpotent group. Then in case $p=5$, we give a complete classification of such graphs $Gamma$ such that either $rm{Aut}(Gamma)$ has a solvable arc-transitive subgroup or $Gamma$ is $T$-arc-transitive with $T$ a non-abelian simple group. We also construct the first infinite family of symmetric graphs that have a quasi-semiregular automorphism and an insolvable full automorphism group.

التوافقية نظرية المجموعة

Distributed Adaptive Huber Regression

78 - Jiyu Luo , Qiang Sun , Wenxin Zhou 2021

Distributed data naturally arise in scenarios involving multiple sources of observations, each stored at a different location. Directly pooling all the data together is often prohibited due to limited bandwidth and storage, or due to privacy protocol s. This paper introduces a new robust distributed algorithm for fitting linear regressions when data are subject to heavy-tailed and/or asymmetric errors with finite second moments. The algorithm only communicates gradient information at each iteration and therefore is communication-efficient. Statistically, the resulting estimator achieves the centralized nonasymptotic error bound as if all the data were pooled together and came from a distribution with sub-Gaussian tails. Under a finite $(2+delta)$-th moment condition, we derive a Berry-Esseen bound for the distributed estimator, based on which we construct robust confidence intervals. Numerical studies further confirm that compared with extant distributed methods, the proposed methods achieve near-optimal accuracy with low variability and better coverage with tighter confidence width.

المنهجية

Adaptive Capped Least Squares

161 - Qiang Sun , Rui Mao , Wen-Xin Zhou 2021

This paper proposes the capped least squares regression with an adaptive resistance parameter, hence the name, adaptive capped least squares regression. The key observation is, by taking the resistant parameter to be data dependent, the proposed esti mator achieves full asymptotic efficiency without losing the resistance property: it achieves the maximum breakdown point asymptotically. Computationally, we formulate the proposed regression problem as a quadratic mixed integer programming problem, which becomes computationally expensive when the sample size gets large. The data-dependent resistant parameter, however, makes the loss function more convex-like for larger-scale problems. This makes a fast randomly initialized gradient descent algorithm possible for global optimization. Numerical examples indicate the superiority of the proposed estimator compared with classical methods. Three data applications to cancer cell lines, stationary background recovery in video surveillance, and blind image inpainting showcase its broad applicability.

المنهجية نظرية الإحصاء نظرية الإحصاء

Synthetic Data Are as Good as the Real for Association Knowledge Learning in Multi-object Tracking

95 - Yuchi Liu , Zhongdao Wang , Xiangxin Zhou 2021

Association, aiming to link bounding boxes of the same identity in a video sequence, is a central component in multi-object tracking (MOT). To train association modules, e.g., parametric networks, real video data are usually used. However, annotating person tracks in consecutive video frames is expensive, and such real data, due to its inflexibility, offer us limited opportunities to evaluate the system performance w.r.t changing tracking scenarios. In this paper, we study whether 3D synthetic data can replace real-world videos for association training. Specifically, we introduce a large-scale synthetic data engine named MOTX, where the motion characteristics of cameras and objects are manually configured to be similar to those in real-world datasets. We show that compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques. Our intriguing observation is credited to two factors. First and foremost, 3D engines can well simulate motion factors such as camera movement, camera view and object movement, so that the simulated videos can provide association modules with effective motion features. Second, experimental results show that the appearance domain gap hardly harms the learning of association knowledge. In addition, the strong customization ability of MOTX allows us to quantitatively assess the impact of motion factors on MOT, which brings new insights to the community.

الرؤية الحاسوبية وتمييز الأنماط

Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection

373 - Xin Zhou , Le Kang , Zhiyu Cheng 2021

With rapidly evolving internet technologies and emerging tools, sports related videos generated online are increasing at an unprecedentedly fast pace. To automate sports video editing/highlight generation process, a key task is to precisely recognize and locate the events in the long untrimmed videos. In this tech report, we present a two-stage paradigm to detect what and when events happen in soccer broadcast videos. Specifically, we fine-tune multiple action recognition models on soccer data to extract high-level semantic features, and design a transformer based temporal detection module to locate the target events. This approach achieved the state-of-the-art performance in both two tasks, i.e., action spotting and replay grounding, in the SoccerNet-v2 Challenge, under CVPR 2021 ActivityNet workshop. Our soccer embedding features are released at https://github.com/baidu-research/vidpress-sports. By sharing these features with the broader community, we hope to accelerate the research into soccer video understanding.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Superconductivity in rhombohedral trilayer graphene

160 - Haoxin Zhou , Tian Xie , Takashi Taniguchi 2021

We report the observation of superconductivity in rhombohedral trilayer graphene electrostatically doped with holes. Superconductivity occurs in two distinct regions within the space of gate-tuned charge carrier density and applied electric displacem ent field, which we denote SC1 and SC2. The high sample quality allows for detailed mapping of the normal state Fermi surfaces by quantum oscillations, which reveal that in both cases superconductivity arises from a normal state described by an annular Fermi sea that is proximal to an isospin symmetry breaking transition where the Fermi surface degeneracy changes. The upper out-of-plane critical field $B_{Cperp}approx 10 mathrm{mT}$ for SC1 and $1mathrm{mT}$ for SC2, implying coherence lengths $xi$ of 200nm and 600nm, respectively. The simultaneous observation of transverse magnetic electron focusing implies a mean free path $ellgtrsim3.5mathrm{mu m}$. Superconductivity is thus deep in the clean limit, with the disorder parameter $d=xi/ell<0.1$. SC1 emerge from a paramagnetic normal state and is suppressed with in-plane magnetic fields in agreement with the Pauli paramagnetic limit. In contrast, SC2 emerges from a spin-polarized, valley-unpolarized half-metal. Measurements of the in-plane critical field show that this superconductor exceeds the Pauli limit by at least one order of magnitude. We discuss our results in light of several mechanisms including conventional phonon-mediated pairing, pairing due to fluctuations of the proximal isospin order, and intrinsic instabilities of the annular Fermi liquid. Our observation of superconductivity in a clean and structurally simple two-dimensional metal hosting a variety of gate tuned magnetic states may enable a new class of field-effect controlled mesoscopic electronic devices combining correlated electron phenomena.

الفيزياء ميسكالي وننكالي الإلكترونات المرتبطة بشدة

Normal Cayley digraphs of dihedral groups with CI-property

92 - Jin-Hua Xie , Yan-Quan Feng , Jin-Xin Zhou 2021

A Cayley (di)graph $Cay(G,S)$ of a group $G$ with respect to $S$ is said to be normal if the right regular representation of $G$ is normal in the automorphism group of $Cay(G,S)$, and is called a CI-(di)graph if there is $alphain Aut(G)$ such that $S ^alpha=T$, whenever $Cay(G,S)cong Cay(G,T)$ for a Cayley (di)graph $Cay(G,T)$. A finite group $G$ is called a DCI-group or a NDCI-group if all Cayley digraphs or normal Cayley digraphs of $G$ are CI-digraphs, and is called a CI-group or a NCI-group if all Cayley graphs or normal Cayley graphs of $G$ are CI-graphs, respectively. Motivated by a conjecture proposed by Adam in 1967, CI-groups and DCI-groups have been actively studied during the last fifty years by many researchers in algebraic graph theory. It takes about thirty years to obtain the classification of cyclic CI-groups and DCI-groups, and recently, the first two authors, among others, classified cyclic NCI-groups and NDCI-groups. Even though there are many partial results on dihedral CI-groups and DCI-groups, their classification is still elusive. In this paper, we prove that a dihedral group of order $2n$ is a NCI-group or a NDCI-group if and only if $n=2,4$ or $n$ is odd. As a direct consequence, we have that if a dihedral group $D_{2n}$ of order $2n$ is a DCI-group then $n=2$ or $n$ is odd-square-free, and that if $D_{2n}$ is a CI-group then $n=2,9$ or $n$ is odd-square-free, throwing some new light on classification of dihedral CI-groups and DCI-groups.

التوافقية نظرية المجموعة

A Fourier-matching Method for Analyzing Resonance Frequencies by a Sound-hard Slab with Arbitrarily Shaped Subwavelength Holes

115 - Wangtao Lu , Wei Wang , Jiaxin Zhou 2021

This paper presents a simple Fourier-matching method to rigorously study resonance frequencies of a sound-hard slab with a finite number of arbitrarily shaped cylindrical holes of diameter ${cal O}(h)$ for $hll1$. Outside the holes, a sound field can be expressed in terms of its normal derivatives on the apertures of holes. Inside each hole, since the vertical variable can be separated, the field can be expressed in terms of a countable set of Fourier basis functions. Matching the field on each aperture yields a linear system of countable equations in terms of a countable set of unknown Fourier coefficients. The linear system can be reduced to a finite-dimensional linear system based on the invertibility of its principal submatrix, which is proved by the well-posedness of a closely related boundary value problem for each hole in the limiting case $hto 0$, so that only the leading Fourier coefficient of each hole is preserved in the finite-dimensional system. The resonance frequencies are those making the resulting finite-dimensional linear system rank deficient. By regular asymptotic analysis for $h ll 1$, we get a systematic asymptotic formula for characterizing the resonance frequencies by the 3D subwavelength structure. The formula reveals an important fact that when all holes are of the same shape, the Q-factor for any resonance frequency asymptotically behaves as ${cal O}(h^{-2})$ for $hll1$ with its prefactor independent of shapes of holes.

تحليل PDES

No Need for Interactions: Robust Model-Based Imitation Learning using Neural ODE

70 - HaoChih Lin , Baopu Li , Xin Zhou 2021

Interactions with either environments or expert policies during training are needed for most of the current imitation learning (IL) algorithms. For IL problems with no interactions, a typical approach is Behavior Cloning (BC). However, BC-like method s tend to be affected by distribution shift. To mitigate this problem, we come up with a Robust Model-Based Imitation Learning (RMBIL) framework that casts imitation learning as an end-to-end differentiable nonlinear closed-loop tracking problem. RMBIL applies Neural ODE to learn a precise multi-step dynamics and a robust tracking controller via Nonlinear Dynamics Inversion (NDI) algorithm. Then, the learned NDI controller will be combined with a trajectory generator, a conditional VAE, to imitate an experts behavior. Theoretical derivation shows that the controller network can approximate an NDI when minimizing the training loss of Neural ODE. Experiments on Mujoco tasks also demonstrate that RMBIL is competitive to the state-of-the-art generative adversarial method (GAIL) and achieves at least 30% performance gain over BC in uneven surfaces.

علم الروبوتات التعلم الآلي أنظمة وتحكم

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد