أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Feng Shi

A Survey on Audio Synthesis and Audio-Visual Multimodal Processing

86 - Zhaofeng Shi 2021

With the development of deep learning and artificial intelligence, audio synthesis has a pivotal role in the area of machine learning and shows strong applicability in the industry. Meanwhile, significant efforts have been dedicated by researchers to handle multimodal tasks at present such as audio-visual multimodal processing. In this paper, we conduct a survey on audio synthesis and audio-visual multimodal processing, which helps understand current research and future trends. This review focuses on text to speech(TTS), music generation and some tasks that combine visual and acoustic information. The corresponding technical methods are comprehensively classified and introduced, and their future development trends are prospected. This survey can provide some guidance for researchers who are interested in the areas like audio synthesis and audio-visual multimodal processing.

معالجة الصوت والكلام أنظمة الصوت في الحاسوب

Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis

392 - Feng Shi , Chonghan Lee , Mohammad Khairul Bashar 2021

CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems. The increasing popularity of these constraint problems in electronic design automation encourages studies on different SAT problems and their properties for fur ther computational efficiency. There has been both theoretical and practical success of modern Conflict-driven clause learning SAT solvers, which allows solving very large industrial instances in a relatively short amount of time. Recently, machine learning approaches provide a new dimension to solving this challenging problem. Neural symbolic models could serve as generic solvers that can be specialized for specific domains based on data without any changes to the structure of the model. In this work, we propose a one-shot model derived from the Transformer architecture to solve the MaxSAT problem, which is the optimization version of SAT where the goal is to satisfy the maximum number of clauses. Our model has a scale-free structure which could process varying size of instances. We use meta-path and self-attention mechanism to capture interactions among homogeneous nodes. We adopt cross-attention mechanisms on the bipartite graph to capture interactions among heterogeneous nodes. We further apply an iterative algorithm to our model to satisfy additional clauses, enabling a solution approaching that of an exact-SAT problem. The attention mechanisms leverage the parallelism for speedup. Our evaluation indicates improved speedup compared to heuristic approaches and improved completion rate compared to machine learning approaches.

الحوسبة العصبية والتطورية الذكاء الاصطناعي التعلم الآلي

STAR: Sparse Transformer-based Action Recognition

146 - Feng Shi , Chonghan Lee , Liang Qiu 2021

The cognitive system for human action and behavior has evolved into a deep learning regime, and especially the advent of Graph Convolution Networks has transformed the field in recent years. However, previous works have mainly focused on over-paramet erized and complex models based on dense graph convolution networks, resulting in low efficiency in training and inference. Meanwhile, the Transformer architecture-based model has not yet been well explored for cognitive application in human action and behavior estimation. This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data. Our model can also process the variable length of video clips grouped as a single batch. Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference. Experiments show that our model achieves 4~18x speedup and 1/7~1/15 model size compared with the baseline models at competitive accuracy.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

VersaGNN: a Versatile accelerator for Graph neural networks

172 - Feng Shi , Ahren Yiqiao Jin , Song-Chun Zhu 2021

textit{Graph Neural Network} (GNN) is a promising approach for analyzing graph-structured data that tactfully captures their dependency information via node-level message passing. It has achieved state-of-the-art performances in many tasks, such as n ode classification, graph matching, clustering, and graph generation. As GNNs operate on non-Euclidean data, their irregular data access patterns cause considerable computational costs and overhead on conventional architectures, such as GPU and CPU. Our analysis shows that GNN adopts a hybrid computing model. The textit{Aggregation} (or textit{Message Passing}) phase performs vector additions where vectors are fetched with irregular strides. The textit{Transformation} (or textit{Node Embedding}) phase can be either dense or sparse-dense matrix multiplication. In this work, We propose textit{VersaGNN}, an ultra-efficient, systolic-array-based versatile hardware accelerator that unifies dense and sparse matrix multiplication. By applying this single optimized systolic array to both aggregation and transformation phases, we have significantly reduced chip sizes and energy consumption. We then divide the computing engine into blocked systolic arrays to support the textit{Strassen}s algorithm for dense matrix multiplication, dramatically scaling down the number of multiplications and enabling high-throughput computation of GNNs. To balance the workload of sparse-dense matrix multiplication, we also introduced a greedy algorithm to combine sparse sub-matrices of compressed format into condensed ones to reduce computational cycles. Compared with current state-of-the-art GNN software frameworks, textit{VersaGNN} achieves on average 3712$times$ speedup with 1301.25$times$ energy reduction on CPU, and 35.4$times$ speedup with 17.66$times$ energy reduction on GPU.

التعلم الآلي الذكاء الاصطناعي هندسة العتاد

Rydberg quantum computation with nuclear spins in two-electron neutral atoms

184 - Xiao-Feng Shi 2021

Alkaline-earth-like~(AEL) atoms with two valence electrons and a nonzero nuclear spin can be excited to Rydberg state for quantum computing. Typical AEL ground states possess no hyperfine splitting, but unfortunately a GHz-scale splitting seems neces sary for Rydberg excitation. Though strong magnetic fields can induce a GHz-scale splitting, weak fields are desirable to avoid noise in experiments. Here, we provide two solutions to this outstanding challenge with realistic data of well-studied AEL isotopes. In the first theory, the two nuclear spin qubit states $|0rangle$ and $|1rangle$ are excited to Rydberg states $|rrangle$ with detuning $Delta$ and 0, respectively, where a MHz-scale detuning $Delta$ arises from a weak magnetic field on the order of 1~G. With a proper ratio between $Delta$ and $Omega$, the qubit state $|1rangle$ can be fully excited to the Rydberg state while $|0rangle$ remains there. In the second theory, we show that by choosing appropriate intermediate states a two-photon Rydberg excitation can proceed with only one nuclear spin qubit state. The second theory is applicable whatever the magnitude of the magnetic field is. These theories bring a versatile means for quantum computation by combining the broad applicability of Rydberg blockade and the incomparable advantages of nuclear-spin quantum memory in two-electron neutral atoms.

الفيزياء الذرية فيزياء الكم

A novel multiple instance learning framework for COVID-19 severity assessment via data augmentation and self-supervised learning

87 - Zekun Li , Wei Zhao , Feng Shi 2021

How to fast and accurately assess the severity level of COVID-19 is an essential problem, when millions of people are suffering from the pandemic around the world. Currently, the chest CT is regarded as a popular and informative imaging tool for COVI D-19 diagnosis. However, we observe that there are two issues -- weak annotation and insufficient data that may obstruct automatic COVID-19 severity assessment with CT images. To address these challenges, we propose a novel three-component method, i.e., 1) a deep multiple instance learning component with instance-level attention to jointly classify the bag and also weigh the instances, 2) a bag-level data augmentation component to generate virtual bags by reorganizing high confidential instances, and 3) a self-supervised pretext component to aid the learning process. We have systematically evaluated our method on the CT images of 229 COVID-19 cases, including 50 severe and 179 non-severe cases. Our method could obtain an average accuracy of 95.8%, with 93.6% sensitivity and 96.4% specificity, which outperformed previous works.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Transition Slow-Down by Rydberg Interaction of Neutral Atoms and a Fast Controlled-NOT Quantum Gate

281 - Xiao-Feng Shi 2021

Exploring controllable interactions lies at the heart of quantum science. Neutral Rydberg atoms provide a versatile route toward flexible interactions between single quanta. Previous efforts mainly focused on the excitation annihilation~(EA) effect o f the Rydberg blockade due to its robustness against interaction fluctuation. We study another effect of the Rydberg blockade, namely, the transition slow-down~(TSD). In TSD, a ground-Rydberg cycling in one atom slows down a Rydberg-involved state transition of a nearby atom, which is in contrast to EA that annihilates a presumed state transition. TSD can lead to an accurate controlled-{footnotesize NOT}~({footnotesize CNOT}) gate with a sub-$mu$s duration about $2pi/Omega+epsilon$ by two pulses, where $epsilon$ is a negligible transient time to implement a phase change in the pulse and $Omega$ is the Rydberg Rabi frequency. The speedy and accurate TSD-based {footnotesize CNOT} makes neutral atoms comparable~(superior) to superconducting~(ion-trap) systems.

فيزياء الكم الفيزياء التطبيقية الفيزياء الذرية

Single-site Rydberg addressing in 3D atomic arrays for quantum computing with neutral atoms

125 - Xiao-Feng Shi 2021

Neutral atom arrays are promising for large-scale quantum computing especially because it is possible to prepare large-scale qubit arrays. An unsolved issue is how to selectively excite one qubit deep in a 3D atomic array to Rydberg states. In this w ork, we show two methods for this purpose. The first method relies on a well-known result: in a dipole transition between two quantum states driven by two off-resonant fields of equal strength but opposite detunings $pmDelta$, the transition is characterized by two counter-rotating Rabi frequencies $Omega e^{pm iDelta t}$~[or $pmOmega e^{pm iDelta t}$ if the two fields have a $pi$-phase difference]. This pair of detuned fields lead to a time-dependent Rabi frequency $2Omega cos(Delta t)$~[or $2iOmega sin(Delta t)$], so that a full transition between the two levels is recovered. We show that when the two detuned fields are sent in different directions, one atom in a 3D optical lattice can be selectively addressed for Rydberg excitation, and when its state is restored, the state of any nontarget atoms irradiated in the light path is also restored. Moreover, we find that the Rydberg excitation by this method can significantly suppress the fundamental blockade error of a Rydberg gate, paving the way for a high-fidelity entangling gate with commonly used quasi-rectangular pulse that is easily obtained by pulse pickers. Along the way, we find a second method for single-site Rydberg addressing in 3D, where a selected target atom can be excited to Rydberg state while preserving the state of any nontarget atom due to a spin echo sequence. The capability to selectively address a target atom in 3D atomic arrays for Rydberg excitation makes it possible to design large-scale neutral-atom information processor based on Rydberg blockade.

فيزياء الكم الفيزياء الذرية بصريات

Reconstruction and interpretation of photon Doppler velocimetry spectrum for ejecta particles from shock-loaded sample in vacuum

113 - Xiao-Feng Shi , Dong-Jun Ma , Song-lin Dang 2020

The photon Doppler velocimetry (PDV) spectrum is investigated in an attempt to reveal the particle parameters of ejecta from shock-loaded samples in a vacuum. A GPU-accelerated Monte-Carlo algorithm, which considers the multiple-scattering effects of light, is applied to reconstruct the light field of the ejecta and simulate the corresponding PDV spectrum. The influence of the velocity profile, total area mass, and particle size of the ejecta on the simulated spectra is discussed qualitatively. To facilitate a quantitative discussion, a novel theoretical optical model is proposed in which the single-scattering assumption is applied. With this model, the relationships between the particle parameters of ejecta and the peak information of the PDV spectrum are derived, enabling direct extraction of the particle parameters from the PDV spectrum. The values of the ejecta parameters estimated from the experimental spectrum are in good agreement with those measured by a piezoelectric probe.

الفيزياء التطبيقية

Temporal Action Detection with Multi-level Supervision

334 - Baifeng Shi , Qi Dai , Judy Hoffman 2020

Training temporal action detection in videos requires large amounts of labeled data, yet such annotation is expensive to collect. Incorporating unlabeled or weakly-labeled data to train action detection model could help reduce annotation cost. In thi s work, we first introduce the Semi-supervised Action Detection (SSAD) task with a mixture of labeled and unlabeled data and analyze different types of errors in the proposed SSAD baselines which are directly adapted from the semi-supervised classification task. To alleviate the main error of action incompleteness (i.e., missing parts of actions) in SSAD baselines, we further design an unsupervised foreground attention (UFA) module utilizing the independence between foreground and background motion. Then we incorporate weakly-labeled data into SSAD and propose Omni-supervised Action Detection (OSAD) with three levels of supervision. An information bottleneck (IB) suppressing the scene information in non-action frames while preserving the action information is designed to help overcome the accompanying action-context confusion problem in OSAD baselines. We extensively benchmark against the baselines for SSAD and OSAD on our created data splits in THUMOS14 and ActivityNet1.2, and demonstrate the effectiveness of the proposed UFA and IB methods. Lastly, the benefit of our full OSAD-IB model under limited annotation budgets is shown by exploring the optimal annotation strategy for labeled, unlabeled and weakly-labeled data.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد