أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Xia Li

PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System

223 - Yuning Du , Chenxia Li , Ruoyu Guo 2021

Optical Character Recognition (OCR) systems have been widely used in various of application scenarios. Designing an OCR system is still a challenging task. In previous work, we proposed a practical ultra lightweight OCR system (PP-OCR) to balance the accuracy against the efficiency. In order to improve the accuracy of PP-OCR and keep high efficiency, in this paper, we propose a more robust OCR system, i.e. PP-OCRv2. We introduce bag of tricks to train a better text detector and a better text recognizer, which include Collaborative Mutual Learning (CML), CopyPaste, Lightweight CPUNetwork (LCNet), Unified-Deep Mutual Learning (U-DML) and Enhanced CTCLoss. Experiments on real data show that the precision of PP-OCRv2 is 7% higher than PP-OCR under the same inference cost. It is also comparable to the server models of the PP-OCR which uses ResNet series as backbones. All of the above mentioned models are open-sourced and the code is available in the GitHub repository PaddleOCR which is powered by PaddlePaddle.

الرؤية الحاسوبية وتمييز الأنماط

BN-NAS: Neural Architecture Search with Batch Normalization

373 - Boyu Chen , Peixia Li , Baopu Li 2021

We present BN-NAS, neural architecture search with Batch Normalization (BN-NAS), to accelerate neural architecture search (NAS). BN-NAS can significantly reduce the time required by model training and evaluation in NAS. Specifically, for fast evaluat ion, we propose a BN-based indicator for predicting subnet performance at a very early training stage. The BN-based indicator further facilitates us to improve the training efficiency by only training the BN parameters during the supernet training. This is based on our observation that training the whole supernet is not necessary while training only BN parameters accelerates network convergence for network architecture search. Extensive experiments show that our method can significantly shorten the time of training supernet by more than 10 times and shorten the time of evaluating subnets by more than 600,000 times without losing accuracy.

الرؤية الحاسوبية وتمييز الأنماط

PSViT: Better Vision Transformer via Token Pooling and Attention Sharing

102 - Boyu Chen , Peixia Li , Baopu Li 2021

In this paper, we observe two levels of redundancies when applying vision transformers (ViT) for image recognition. First, fixing the number of tokens through the whole network produces redundant features at the spatial level. Second, the attention m aps among different transformer layers are redundant. Based on the observations above, we propose a PSViT: a ViT with token Pooling and attention Sharing to reduce the redundancy, effectively enhancing the feature representation ability, and achieving a better speed-accuracy trade-off. Specifically, in our PSViT, token pooling can be defined as the operation that decreases the number of tokens at the spatial level. Besides, attention sharing will be built between the neighboring transformer layers for reusing the attention maps having a strong correlation among adjacent layers. Then, a compact set of the possible combinations for different token pooling and attention sharing mechanisms are constructed. Based on the proposed compact set, the number of tokens in each layer and the choices of layers sharing attention can be treated as hyper-parameters that are learned from data automatically. Experimental results show that the proposed scheme can achieve up to 6.6% accuracy improvement in ImageNet classification compared with the DeiT.

الرؤية الحاسوبية وتمييز الأنماط

Rectified Euler k-means and Beyond

83 - Yunxia Lin , Songcan chen 2021

Euler k-means (EulerK) first maps data onto the unit hyper-sphere surface of equi-dimensional space via a complex mapping which induces the robust Euler kernel and next employs the popular $k$-means. Consequently, besides enjoying the virtues of k-me ans such as simplicity and scalability to large data sets, EulerK is also robust to noises and outliers. Although so, the centroids captured by EulerK deviate from the unit hyper-sphere surface and thus in strict distributional sense, actually are outliers. This weird phenomenon also occurs in some generic kernel clustering methods. Intuitively, using such outlier-like centroids should not be quite reasonable but it is still seldom attended. To eliminate the deviation, we propose two Rectified Euler k-means methods, i.e., REK1 and REK2, which retain the merits of EulerK while acquire real centroids residing on the mapped space to better characterize the data structures. Specifically, REK1 rectifies EulerK by imposing the constraint on the centroids while REK2 views each centroid as the mapped image from a pre-image in the original space and optimizes these pre-images in Euler kernel induced space. Undoubtedly, our proposed REKs can methodologically be extended to solve problems of such a category. Finally, the experiments validate the effectiveness of REK1 and REK2.

التعلم الآلي

SODA: A Semantics-Aware Optimization Framework for Data-Intensive Applications Using Hybrid Program Analysis

69 - Bingbing Rao , Zixia Liu , Hong Zhang 2021

In the era of data explosion, a growing number of data-intensive computing frameworks, such as Apache Hadoop and Spark, have been proposed to handle the massive volume of unstructured data in parallel. Since programming models provided by these frame works allow users to specify complex and diversified user-defined functions (UDFs) with predefined operations, the grand challenge of tuning up entire system performance arises if programmers do not fully understand the semantics of code, data, and runtime systems. In this paper, we design a holistic semantics-aware optimization for data-intensive applications using hybrid program analysis} (SODA) to assist programmers to tune performance issues. SODA is a two-phase framework: the offline phase is a static analysis that analyzes code and performance profiling data from the online phase of prior executions to generate a parameterized and instrumented application; the online phase is a dynamic analysis that keeps track of the applications execution and collects runtime information of data and system. Extensive experimental results on four real-world Spark applications show that SODA can gain up to 60%, 10%, 8%, faster than its original implementation, with the three proposed optimization strategies, i.e., cache management, operation reordering, and element pruning, respectively.

النظم الموزعة والتوازية والحوسبة العنقودية

Binary irreducible quasi-cyclic parity-check subcodes of Goppa codes and extended Goppa codes

86 - Xia Li , Qin Yue , Daitao Huang 2021

Goppa codes are particularly appealing for cryptographic applications. Every improvement of our knowledge of Goppa codes is of particular interest. In this paper, we present a sufficient and necessary condition for an irreducible monic polynomial $g( x)$ of degree $r$ over $mathbb{F}_{q}$ satisfying $gamma g(x)=(x+d)^rg({A}(x))$, where $q=2^n$, $A=left(begin{array}{cc} a&b1&dend{array}right)in PGL_2(Bbb F_{q})$, $mathrm{ord}(A)$ is a prime, $g(a) e 0$, and $0 e gammain Bbb F_q$. And we give a complete characterization of irreducible polynomials $g(x)$ of degree $2s$ or $3s$ as above, where $s$ is a positive integer. Moreover, we construct some binary irreducible quasi-cyclic parity-check subcodes of Goppa codes and extended Goppa codes.

نظرية المعلومات نظرية المعلومات

A Unified Formula of the Optimal Portfolio for Piecewise HARA Utilities

119 - Zongxia Liang , Yang Liu , Ming Ma 2021

We propose a general family of piecewise hyperbolic absolute risk aversion (PHARA) utility, including many non-standard utilities as examples. A typical application is the composition of an HARA preference and a piecewise linear payoff in hedge fund management. We derive a unified closed-form formula of the optimal portfolio, which is a four-term division. The formula has clear economic meanings, reflecting the behavior of risk aversion, risk seeking, loss aversion and first-order risk aversion. One main finding is that risk-taking behaviors are greatly increased by non-concavity and reduced by non-differentiability.

الإحصاء والرياضيات المالية التحسين والتحكم إدارة المحافظ

GLiT: Neural Architecture Search for Global and Local Image Transformer

294 - Boyu Chen , Peixia Li , Chuming Li 2021

We introduce the first Neural Architecture Search (NAS) method to find a better transformer architecture for image recognition. Recently, transformers without CNN-based backbones are found to achieve impressive performance for image recognition. Howe ver, the transformer is designed for NLP tasks and thus could be sub-optimal when directly used for image recognition. In order to improve the visual representation ability for transformers, we propose a new search space and searching algorithm. Specifically, we introduce a locality module that models the local correlations in images explicitly with fewer computational cost. With the locality module, our search space is defined to let the search algorithm freely trade off between global and local information as well as optimizing the low-level design choice in each module. To tackle the problem caused by huge search space, a hierarchical neural architecture search method is proposed to search the optimal vision transformer from two levels separately with the evolutionary algorithm. Extensive experiments on the ImageNet dataset demonstrate that our method can find more discriminative and efficient transformer variants than the ResNet family (e.g., ResNet101) and the baseline ViT for image classification.

الرؤية الحاسوبية وتمييز الأنماط

A Simple and Practical Approach to Improve Misspellings in OCR Text

88 - Junxia Lin Georgetown Universityn Medical Center , Georgetown University 2021

The focus of our paper is the identification and correction of non-word errors in OCR text. Such errors may be the result of incorrect insertion, deletion, or substitution of a character, or the transposition of two adjacent characters within a singl e word. Or, it can be the result of word boundary problems that lead to run-on errors and incorrect-split errors. The traditional N-gram correction methods can handle single-word errors effectively. However, they show limitations when dealing with split and merge errors. In this paper, we develop an unsupervised method that can handle both errors. The method we develop leads to a sizable improvement in the correction rates. This tutorial paper addresses very difficult word correction problems - namely incorrect run-on and split errors - and illustrates what needs to be considered when addressing such problems. We outline a possible approach and assess its success on a limited study.

الحساب واللغة

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

90 - Lei Ke , Xia Li , Martin Danelljan 2021

Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. Most approaches only exploit the temporal dimension to address the association problem, while relying on single frame predictions for the segmentation mask itself. We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation. PCAN first distills a space-time memory into a set of prototypes and then employs cross-attention to retrieve rich information from the past frames. To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time. Extensive experiments demonstrate that PCAN outperforms current video instance tracking and segmentation competition winners on both Youtube-VIS and BDD100K datasets, and shows efficacy to both one-stage and two-stage segmentation frameworks. Code will be available at http://vis.xyz/pub/pcan.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد