أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Bo Zhang

Characterization of the frequency response of channel-interleaved photonic ADCs based on the optical time-division demultiplexer

133 - Na Qian , Linbo Zhang , Jianping Chen 2021

We characterize the frequency response of channel-interleaved photonic analog-to-digital converters (CI-PADCs) theoretically and experimentally. The CI-PADC is composed of a photonic frontend for photonic sampling and an electronic backend for quanti zation. The photonic frontend includes a photonic sampling pulse generator for directly high-speed sampling and an optical time-division demultiplexer (OTDM) for channel demultiplexing. It is found that the frequency response of the CI-PADC is influenced by both the photonic sampling pulses and the OTDM, of which the combined impact can be characterized through demultiplexed pulse trains. First, the frequency response can be divided into multiple frequency intervals and the range of the frequency interval equals the repetition rate of demultiplexed pulse trains. Second, the analog bandwidth of the CI-PADC is determined by the optical spectral bandwidth of demultiplexed pulse trains which is broadened in the OTDM. Further, the effect of the OTDM is essential for enlarging the analog bandwidth of the CI-PADC employing the photonic sampling pulses with a limited optical spectral bandwidth.

معالجة الإشارات بصريات

A Novel Multi-Centroid Template Matching Algorithm and Its Application to Cough Detection

87 - Shibo Zhang , Ebrahim Nemati , Tousif Ahmed 2021

Cough is a major symptom of respiratory-related diseases. There exists a tremendous amount of work in detecting coughs from audio but there has been no effort to identify coughs from solely inertial measurement unit (IMU). Coughing causes motion acro ss the whole body and especially on the neck and head. Therefore, head motion data during coughing captured by a head-worn IMU sensor could be leveraged to detect coughs using a template matching algorithm. In time series template matching problems, K-Nearest Neighbors (KNN) combined with elastic distance measurement (esp. Dynamic Time Warping (DTW)) achieves outstanding performance. However, it is often regarded as prohibitively time-consuming. Nearest Centroid Classifier is thereafter proposed. But the accuracy is comprised of only one centroid obtained for each class. Centroid-based Classifier performs clustering and averaging for each cluster, but requires manually setting the number of clusters. We propose a novel self-tuning multi-centroid template-matching algorithm, which can automatically adjust the number of clusters to balance accuracy and inference time. Through experiments conducted on synthetic datasets and a real-world earbud-based cough dataset, we demonstrate the superiority of our proposed algorithm and present the result of cough detection with a single accelerometer sensor on the earbuds platform.

أنظمة الصوت في الحاسوب تفاعل الإنسان والحاسوب التعلم الآلي

Densely Semantic Enhancement for Domain Adaptive Region-free Detectors

219 - Bo Zhang , Tao Chen , Bin Wang 2021

Unsupervised domain adaptive object detection aims to adapt a well-trained detector from its original source domain with rich labeled data to a new target domain with unlabeled data. Previous works focus on improving the domain adaptability of region -based detectors, e.g., Faster-RCNN, through matching cross-domain instance-level features that are explicitly extracted from a region proposal network (RPN). However, this is unsuitable for region-free detectors such as single shot detector (SSD), which perform a dense prediction from all possible locations in an image and do not have the RPN to encode such instance-level features. As a result, they fail to align important image regions and crucial instance-level features between the domains of region-free detectors. In this work, we propose an adversarial module to strengthen the cross-domain matching of instance-level features for region-free detectors. Firstly, to emphasize the important regions of image, the DSEM learns to predict a transferable foreground enhancement mask that can be utilized to suppress the background disturbance in an image. Secondly, considering that region-free detectors recognize objects of different scales using multi-scale feature maps, the DSEM encodes both multi-level semantic representations and multi-instance spatial-contextual relationships across different domains. Finally, the DSEM is pluggable into different region-free detectors, ultimately achieving the densely semantic feature matching via adversarial learning. Extensive experiments have been conducted on PASCAL VOC, Clipart, Comic, Watercolor, and FoggyCityscape benchmarks, and their results well demonstrate that the proposed approach not only improves the domain adaptability of region-free detectors but also outperforms existing domain adaptive region-based detectors under various domain shift settings.

الرؤية الحاسوبية وتمييز الأنماط

Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification

98 - Yike Wu , Bo Zhang , Gang Yu 2021

The goal of few-shot fine-grained image classification is to recognize rarely seen fine-grained objects in the query set, given only a few samples of this class in the support set. Previous works focus on learning discriminative image features from a limited number of training samples for distinguishing various fine-grained classes, but ignore one important fact that spatial alignment of the discriminative semantic features between the query image with arbitrary changes and the support image, is also critical for computing the semantic similarity between each support-query pair. In this work, we propose an object-aware long-short-range spatial alignment approach, which is composed of a foreground object feature enhancement (FOE) module, a long-range semantic correspondence (LSC) module and a short-range spatial manipulation (SSM) module. The FOE is developed to weaken background disturbance and encourage higher foreground object response. To address the problem of long-range object feature misalignment between support-query image pairs, the LSC is proposed to learn the transferable long-range semantic correspondence by a designed feature similarity metric. Further, the SSM module is developed to refine the transformed support feature after the long-range step to align short-range misaligned features (or local details) with the query features. Extensive experiments have been conducted on four benchmark datasets, and the results show superior performance over most state-of-the-art methods under both 1-shot and 5-shot classification scenarios.

الرؤية الحاسوبية وتمييز الأنماط

Quantum kernels with squeezed-state encoding for machine learning

128 - Long Hin Li , Dan-Bo Zhang , Z. D. Wang 2021

Kernel methods are powerful for machine learning, as they can represent data in feature spaces that similarities between samples may be faithfully captured. Recently, it is realized that machine learning enhanced by quantum computing is closely relat ed to kernel methods, where the exponentially large Hilbert space turns to be a feature space more expressive than classical ones. In this paper, we generalize quantum kernel methods by encoding data into continuous-variable quantum states, which can benefit from the infinite-dimensional Hilbert space of continuous variables. Specially, we propose squeezed-state encoding, in which data is encoded as either in the amplitude or the phase. The kernels can be calculated on a quantum computer and then are combined with classical machine learning, e.g. support vector machine, for training and predicting tasks. Their comparisons with other classical kernels are also addressed. Lastly, we discuss physical implementations of squeezed-state encoding for machine learning in quantum platforms such as trapped ions.

فيزياء الكم

INVIGORATE: Interactive Visual Grounding and Grasping in Clutter

98 - Hanbo Zhang , Yunfan Lu , Cunjun Yu 2021

This paper presents INVIGORATE, a robot system that interacts with human through natural language and grasps a specified object in clutter. The objects may occlude, obstruct, or even stack on top of one another. INVIGORATE embodies several challenges : (i) infer the target object among other occluding objects, from input language expressions and RGB images, (ii) infer object blocking relationships (OBRs) from the images, and (iii) synthesize a multi-step plan to ask questions that disambiguate the target object and to grasp it successfully. We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping. They allow for unrestricted object categories and language expressions, subject to the training datasets. However, errors in visual perception and ambiguity in human languages are inevitable and negatively impact the robots performance. To overcome these uncertainties, we build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules. Through approximate POMDP planning, the robot tracks the history of observations and asks disambiguation questions in order to achieve a near-optimal sequence of actions that identify and grasp the target object. INVIGORATE combines the benefits of model-based POMDP planning and data-driven deep learning. Preliminary experiments with INVIGORATE on a Fetch robot show significant benefits of this integrated approach to object grasping in clutter with natural language interactions. A demonstration video is available at https://youtu.be/zYakh80SGcU.

علم الروبوتات

On Decidability of the Bisimilarity on Higher-order Processes with Parameterization

128 - Xian Xu , Wenbo Zhangn (Shanghai Ocean University 2021

Higher-order processes with parameterization are capable of abstraction and application (migrated from the lambda-calculus), and thus are computationally more expressive. For the minimal higher-order concurrency, it is well-known that the strong bisi milarity (i.e., the strong bisimulation equality) is decidable in absence of parameterization. By contrast, whether the strong bisimilarity is still decidable for parameterized higher-order processes remains unclear. In this paper, we focus on this issue. There are basically two kinds of parameterization: one on names and the other on processes. We show that the strong bisimilarity is indeed decidable for higher-order processes equipped with both kinds of parameterization. Then we demonstrate how to adapt the decision approach to build an axiom system for the strong bisimilarity. On top of these results, we provide an algorithm for the bisimilarity checking.

المنطق في علوم الحاسوب

Random Flux Driven Metal to Higher-Order Topological Insulator Transition

110 - Chang-An Li , Song-Bo Zhang , Jan Carl Budich 2021

Random flux is commonly believed to be incapable of driving metal-insulator transitions. Surprisingly, we show that random flux can after all induce a metal-insulator transition in the two-dimensional Su-Schrieffer-Heeger model, thus reporting the fi rst example of such a transition. Remarkably, we find that the resulting insulating phase can even be a higher-order topological insulator with zero-energy corner modes and fractional corner charges, rather than a conventional Anderson insulator. Employing both level statistics and finite-size scaling analysis, we characterize the metal-insulator transition and numerically extract its critical exponent as $ u=2.48pm0.08$. To reveal the physical mechanism underlying the transition, we present an effective band structure picture based on the random flux averaged Greens function.

الفيزياء ميسكالي وننكالي

The Relative Calibration of Radial Velocity for LAMOST Medium Resolution Stellar Spectra

155 - Jianping Xiong , Bo Zhang , Chao Liu 2021

The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) started median-resolution spectroscopic (MRS, R$sim$7500) survey since October 2018. The main scientific goals of MRS, including binary stars, pulsators, and other variable stars are launched with a time-domain spectroscopic survey. However, the systematic errors, including the bias induced from wavelength calibration and the systematic difference between different spectrographs have to be carefully considered during radial velocity measurement. In this work, we provide a technique to correct the systematics in the wavelength calibration based on the relative radial velocity measurements from LAMOST MRS spectra. We show that, for the stars with multi-epoch spectra, the systematic bias which is induced from the exposures of different nights can be well corrected for LAMOST MRS in each spectrograph. And the precision of radial velocity zero-point of multi-epoch time-domain observations reaches below 0.5 km/s . As a by-product, we also give the constant star candidates, which can be the secondary radial-velocity standard star candidates of LAMOST MRS time-domain surveys.

الفيزياء الفلكية الشمسية والنجوم الفيزياء الفلكية من المجرات الأجهزة والأساليب للزيئات الفيزياء الفلكية

Structure-Aware Feature Generation for Zero-Shot Learning

213 - Lianbo Zhang , Shaoli Huang , Xinchao Wang 2021

Zero-Shot Learning (ZSL) targets at recognizing unseen categories by leveraging auxiliary information, such as attribute embedding. Despite the encouraging results achieved, prior ZSL approaches focus on improving the discriminant power of seen-class features, yet have largely overlooked the geometric structure of the samples and the prototypes. The subsequent attribute-based generative adversarial network (GAN), as a result, also neglects the topological information in sample generation and further yields inferior performances in classifying the visual features of unseen classes. In this paper, we introduce a novel structure-aware feature generation scheme, termed as SA-GAN, to explicitly account for the topological structure in learning both the latent space and the generative networks. Specifically, we introduce a constraint loss to preserve the initial geometric structure when learning a discriminative latent space, and carry out our GAN training with additional supervising signals from a structure-aware discriminator and a reconstruction module. The former supervision distinguishes fake and real samples based on their affinity to class prototypes, while the latter aims to reconstruct the original feature space from the generated latent space. This topology-preserving mechanism enables our method to significantly enhance the generalization capability on unseen-classes and consequently improve the classification performance. Experiments on four benchmarks demonstrate that the proposed approach consistently outperforms the state of the art. Our code can be found in the supplementary material and will also be made publicly available.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد