أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Jiahui Chen

Emerging vaccine-breakthrough SARS-CoV-2 variants

181 - Rui Wang , Jiahui Chen , Yuta Hozumi 2021

The recent global surge in COVID-19 infections has been fueled by new SARS-CoV-2 variants, namely Alpha, Beta, Gamma, Delta, etc. The molecular mechanism underlying such surge is elusive due to 4,653 non-degenerate mutations on the spike protein, whi ch is the target of most COVID-19 vaccines. The understanding of the molecular mechanism of transmission and evolution is a prerequisite to foresee the trend of emerging vaccine-breakthrough variants and the design of mutation-proof vaccines and monoclonal antibodies. We integrate the genotyping of 1,489,884 SARS-CoV-2 genomes isolates, 130 human antibodies, tens of thousands of mutational data points, topological data analysis, and deep learning to reveal SARS-CoV-2 evolution mechanism and forecast emerging vaccine-escape variants. We show that infectivity-strengthening and antibody-disruptive co-mutations on the S protein RBD can quantitatively explain the infectivity and virulence of all prevailing variants. We demonstrate that Lambda is as infectious as Delta but is more vaccine-resistant. We analyze emerging vaccine-breakthrough co-mutations in 20 countries, including the United Kingdom, the United States, Denmark, Brazil, and Germany, etc. We envision that natural selection through infectivity will continue to be the main mechanism for viral evolution among unvaccinated populations, while antibody disruptive co-mutations will fuel the future growth of vaccine-breakthrough variants among fully vaccinated populations. Finally, we have identified the co-mutations that have the great likelihood of becoming dominant: [A411S, L452R, T478K], [L452R, T478K, N501Y], [V401L, L452R, T478K], [K417N, L452R, T478K], [L452R, T478K, E484K, N501Y], and [P384L, K417N, E484K, N501Y]. We predict they, particularly the last four, will break through existing vaccines. We foresee an urgent need to develop new vaccines that target these co-mutations.

الجزيئات الحيوية السكان والتطور

TL-SDD: A Transfer Learning-Based Method for Surface Defect Detection with Few Samples

187 - Jiahui Cheng , Bin Guo , Jiaqi Liu 2021

Surface defect detection plays an increasingly important role in manufacturing industry to guarantee the product quality. Many deep learning methods have been widely used in surface defect detection tasks, and have been proven to perform well in defe cts classification and location. However, deep learning-based detection methods often require plenty of data for training, which fail to apply to the real industrial scenarios since the distribution of defect categories is often imbalanced. In other words, common defect classes have many samples but rare defect classes have extremely few samples, and it is difficult for these methods to well detect rare defect classes. To solve the imbalanced distribution problem, in this paper we propose TL-SDD: a novel Transfer Learning-based method for Surface Defect Detection. First, we adopt a two-phase training scheme to transfer the knowledge from common defect classes to rare defect classes. Second, we propose a novel Metric-based Surface Defect Detection (M-SDD) model. We design three modules for this model: (1) feature extraction module: containing feature fusion which combines high-level semantic information with low-level structural information. (2) feature reweighting module: transforming examples to a reweighting vector that indicates the importance of features. (3) distance metric module: learning a metric space in which defects are classified by computing distances to representations of each category. Finally, we validate the performance of our proposed method on a real dataset including surface defects of aluminum profiles. Compared to the baseline methods, the performance of our proposed method has improved by up to 11.98% for rare defect classes.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Practical and Configurable Network Traffic Classification Using Probabilistic Machine Learning

399 - Jiahui Chen , Joe Breen , Jeff M. Phillips 2021

Network traffic classification that is widely applicable and highly accurate is valuable for many network security and management tasks. A flexible and easily configurable classification framework is ideal, as it can be customized for use in a wide v ariety of networks. In this paper, we propose a highly configurable and flexible machine learning traffic classification method that relies only on statistics of sequences of packets to distinguish known, or approved, traffic from unknown traffic. Our method is based on likelihood estimation, provides a measure of certainty for classification decisions, and can classify traffic at adjustable certainty levels. Our classification method can also be applied in different classification scenarios, each prioritizing a different classification goal. We demonstrate how our classification scheme and all its configurations perform well on real-world traffic from a high performance computing network environment.

التعلم الآلي بنية الشبكات والإنترنت

THUE: Discovering Top-K High Utility Episodes

347 - Shicheng Wan , Jiahui Chen , Wensheng Gan 2021

Episode discovery from an event is a popular framework for data mining tasks and has many real-world applications. An episode is a partially ordered set of objects (e.g., item, node), and each object is associated with an event type. This episode can also be considered as a complex event sub-sequence. High-utility episode mining is an interesting utility-driven mining task in the real world. Traditional episode mining algorithms, by setting a threshold, usually return a huge episode that is neither intuitive nor saves time. In general, finding a suitable threshold in a pattern-mining algorithm is a trivial and time-consuming task. In this paper, we propose a novel algorithm, called Top-K High Utility Episode (THUE) mining within the complex event sequence, which redefines the previous mining task by obtaining the K highest episodes. We introduce several threshold-raising strategies and optimize the episode-weighted utilization upper bounds to speed up the mining process and effectively reduce the memory cost. Finally, the experimental results on both real-life and synthetic datasets reveal that the THUE algorithm can offer six to eight orders of magnitude running time performance improvement over the state-of-the-art algorithm and has low memory consumption.

قواعد البيانات

TOPIC: Top-k High-Utility Itemset Discovering

162 - Jiahui Chen , Shicheng Wan , Wensheng Gan 2021

Utility-driven itemset mining is widely applied in many real-world scenarios. However, most algorithms do not work for itemsets with negative utilities. Several efficient algorithms for high-utility itemset (HUI) mining with negative utilities have b een proposed. These algorithms can find complete HUIs with or without negative utilities. However, the major problem with these algorithms is how to select an appropriate minimum utility (minUtil) threshold. To address this issue, some efficient algorithms for extracting top-k HUIs have been proposed, where parameter k is the quantity of HUIs to be discovered. However, all of these algorithms can solve only one part of the above problem. In this paper, we present a method for TOP-k high-utility Itemset disCovering (TOPIC) with positive and negative utility values, which utilizes the advantages of the above algorithms. TOPIC adopts transaction merging and database projection techniques to reduce the database scanning cost, and utilizes minUtil threshold raising strategies. It also uses an array-based utility technique, which calculates the utility of itemsets and upper bounds in linear time. We conducted extensive experiments on several real and synthetic datasets, and the results showed that TOPIC outperforms state-of-the-art algorithm in terms of runtime, memory costs, and scalability.

قواعد البيانات

Estimate the spectrum of affine dynamical systems from partial observations of a single trajectory data

196 - Jiahui Cheng , Sui Tang 2021

In this paper, we study the nonlinear inverse problem of estimating the spectrum of a system matrix, that drives a finite-dimensional affine dynamical system, from partial observations of a single trajectory data. In the noiseless case, we prove an a nnihilating polynomial of the system matrix, whose roots are a subset of the spectrum, can be uniquely determined from data. We then study which eigenvalues of the system matrix can be recovered and derive various sufficient and necessary conditions to characterize the relationship between the recoverability of each eigenvalue and the observation locations. We propose various reconstruction algorithms, with theoretical guarantees, generalizing the classical Prony method, ESPIRIT, and matrix pencil method. We test the algorithms over a variety of examples with applications to graph signal processing, disease modeling and a real-human motion dataset. The numerical results validate our theoretical results and demonstrate the effectiveness of the proposed algorithms, even when the data did not follow an exact linear dynamical system.

التحليل العددي نظرية المعلومات التحليل العددي

Vaccine-escape and fast-growing mutations in the United Kingdom, the United States, Singapore, Spain, South Africa, and other COVID-19-devastated countries

72 - Rui Wang , Jiahui Chen , Kaifu Gao 2021

Recently, the SARS-CoV-2 variants from the United Kingdom (UK), South Africa, and Brazil have received much attention for their increased infectivity, potentially high virulence, and possible threats to existing vaccines and antibody therapies. The q uestion remains if there are other more infectious variants transmitted around the world. We carry out a large-scale study of 252,874 SARS-CoV-2 genome isolates from patients to identify many other rapidly growing mutations on the spike (S) protein receptor-binding domain (RDB). We reveal that 88 out of 95 significant mutations that were observed more than 10 times strengthen the binding between the RBD and the host angiotensin-converting enzyme 2 (ACE2), indicating the virus evolves toward more infectious variants. In particular, we discover new fast-growing RBD mutations N439K, L452R, S477N, S477R, and N501T that also enhance the RBD and ACE2 binding. We further unveil that mutation N501Y involved in United Kingdom (UK), South Africa, and Brazil variants may moderately weaken the binding between the RBD and many known antibodies, while mutations E484K and K417N found in South Africa and Brazilian variants can potentially disrupt the binding between the RDB and many known antibodies. Among three newly identified fast-growing RBD mutations, L452R, which is now known as part of the California variant B.1.427, and N501T are able to effectively weaken the binding of many known antibodies with the RBD. Finally, we hypothesize that RBD mutations that can simultaneously make SARS-CoV-2 more infectious and disrupt the existing antibodies, called vaccine escape mutations, will pose an imminent threat to the current crop of vaccines. A list of most likely vaccine escape mutations is given, including N501Y, L452R, E484K, N501T, S494P, and K417N.

السكان والتطور الأساليب الكمية

Methodology-centered review of molecular modeling, simulation, and prediction of SARS-CoV-2

163 - Kaifu Gao , Rui Wang , Jiahui Chen 2021

The deadly coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has gone out of control globally. Despite much effort by scientists, medical experts, and society in general, the slow prog ress on drug discovery and antibody therapeutic development, the unknown possible side effects of the existing vaccines, and the high transmission rate of the SARS-CoV-2, remind us of the sad reality that our current understanding of the transmission, infectivity, and evolution of SARS-CoV-2 is unfortunately very limited. The major limitation is the lack of mechanistic understanding of viral-host cell interactions, the viral regulation, protein-protein interactions, including antibody-antigen binding, protein-drug binding, host immune response, etc. This limitation will likely haunt the scientific community for a long time and have a devastating consequence in combating COVID-19 and other pathogens. Notably, compared to the long-cycle, highly cost, and safety-demanding molecular-level experiments, the theoretical and computational studies are economical, speedy, and easy to perform. There exists a tsunami of the literature on molecular modeling, simulation, and prediction of SARS-CoV-2 that has become impossible to fully be covered in a review. To provide the reader a quick update about the status of molecular modeling, simulation, and prediction of SARS-CoV-2, we present a comprehensive and systematic methodology-centered narrative in the nick of time. Aspects such as molecular modeling, Monte Carlo (MC) methods, structural bioinformatics, machine learning, deep learning, and mathematical approaches are included in this review. This review will be beneficial to researchers who are looking for ways to contribute to SARS-CoV-2 studies and those who are assessing the current status in the field.

الجزيئات الحيوية

On the Construction of a Post-Quantum Blockchain for Smart City

122 - Jiahui Chen , Wensheng Gan , Muchuang Hu 2020

Owing to some special characteristics and features, blockchain is a very useful technique that can securely organize diverse devices in a smart city. It finds wide applications, especially in distributed environments, where entities such as wireless sensors need to be certain of the authenticity of the server. As contemporary blockchain techniques that address post-quantum concerns have not been designed, in this study, we investigate a blockchain in the post-quantum setting and seek to discover how it can resist attacks from quantum computing. In addition, traditional proof of work (PoW)-based consensus protocols such as Bitcoin cannot supply memory mining, and the transaction capacity of each block in a blockchain is limited and needs to be expanded. Thus, a new post-quantum proof of work (post-quantum PoW) consensus algorithm for security and privacy of smart city applications is proposed. It can be used to not only protect a blockchain under a quantum computing attack compared to existing classical hash-based PoW algorithms but also to supply memory mining. Meanwhile, an identity-based post-quantum signature is embedded into a transaction process to construct lightweight transactions. Subsequently, we provide a detailed description on the execution of the post-quantum lightweight transaction in a blockchain. Overall, this work can help enrich the research on future post-quantum blockchain and support the construction or architecture of emerging blockchain-based smart cities.

التشفير والأمن

Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies

148 - Jiahui Chen , Kaifu Gao , Rui Wang 2020

Antibody therapeutics and vaccines are among our last resort to end the raging COVID-19 pandemic. They, however, are prone to over 5,000 mutations on the spike (S) protein uncovered by a Mutation Tracker based on over 200,000 genome isolates. It is i mperative to understand how mutations would impact vaccines and antibodies in the development. In this work, we study the mechanism, frequency, and ratio of mutations on the S protein. Additionally, we use 56 antibody structures and analyze their 2D and 3D characteristics. Moreover, we predict the mutation-induced binding free energy (BFE) changes for the complexes of S protein and antibodies or ACE2. By integrating genetics, biophysics, deep learning, and algebraic topology, we reveal that most of 462 mutations on the receptor-binding domain (RBD) will weaken the binding of S protein and antibodies and disrupt the efficacy and reliability of antibody therapies and vaccines. A list of 31 vaccine escape mutants is identified, while many other disruptive mutations are detailed as well. We also unveil that about 65% existing RBD mutations, including those variants recently found in the United Kingdom (UK) and South Africa, are binding-strengthen mutations, resulting in more infectious COVID-19 variants. We discover the disparity between the extreme values of RBD mutation-induced BFE strengthening and weakening of the bindings with antibodies and ACE2, suggesting that SARS-CoV-2 is at an advanced stage of evolution for human infection, while the human immune system is able to produce optimized antibodies. This discovery implies the vulnerability of current vaccines and antibody drugs to new mutations. Our predictions were validated by comparison with more than 1,400 deep mutations on the S protein RBD. Our results show the urgent need to develop new mutation-resistant vaccines and antibodies and to prepare for seasonal vaccinations.

الجزيئات الحيوية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد