أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Cheng Hu

Formal Query Building with Query Structure Prediction for Complex Question Answering over Knowledge Base

147 - Yongrui Chen , Huiying Li , Yuncheng Hua 2021

Formal query building is an important part of complex question answering over knowledge bases. It aims to build correct executable queries for questions. Recent methods try to rank candidate queries generated by a state-transition strategy. However, this candidate generation strategy ignores the structure of queries, resulting in a considerable number of noisy queries. In this paper, we propose a new formal query building approach that consists of two stages. In the first stage, we predict the query structure of the question and leverage the structure to constrain the generation of the candidate queries. We propose a novel graph generation framework to handle the structure prediction task and design an encoder-decoder model to predict the argument of the predetermined operation in each generative step. In the second stage, we follow the previous methods to rank the candidate queries. The experimental results show that our formal query building approach outperforms existing methods on complex questions while staying competitive on simple questions.

الحساب واللغة

Shortened effect of coherence length of light due to nonselective linear absorption

248 - Xingchu Zhang , Zhencheng Huang , Zidong Luang 2021

From the Michelson interference of He-Ne laser beam, it is found that the coherence length of the beam decreases with the decrease of intensity when the laser beam passes through a nonselective absorption filter and the intensity becomes low enough. The effect can be explained by using the discrete wavelet structure model of classic plane light waves.

الفيزياء العامة

The Impact of Machine Learning on 2D/3D Registration for Image-guided Interventions: A Systematic Review and Perspective

424 - Mathias Unberath , Cong Gao , Yicheng Hu 2021

Image-based navigation is widely considered the next frontier of minimally invasive surgery. It is believed that image-based navigation will increase the access to reproducible, safe, and high-precision surgery as it may then be performed at acceptab le costs and effort. This is because image-based techniques avoid the need of specialized equipment and seamlessly integrate with contemporary workflows. Further, it is expected that image-based navigation will play a major role in enabling mixed reality environments and autonomous, robotic workflows. A critical component of image guidance is 2D/3D registration, a technique to estimate the spatial relationships between 3D structures, e.g., volumetric imagery or tool models, and 2D images thereof, such as fluoroscopy or endoscopy. While image-based 2D/3D registration is a mature technique, its transition from the bench to the bedside has been restrained by well-known challenges, including brittleness of the optimization objective, hyperparameter selection, and initialization, difficulties around inconsistencies or multiple objects, and limited single-view performance. One reason these challenges persist today is that analytical solutions are likely inadequate considering the complexity, variability, and high-dimensionality of generic 2D/3D registration problems. The recent advent of machine learning-based approaches to imaging problems that, rather than specifying the desired functional mapping, approximate it using highly expressive parametric models holds promise for solving some of the notorious challenges in 2D/3D registration. In this manuscript, we review the impact of machine learning on 2D/3D registration to systematically summarize the recent advances made by introduction of this novel technology. Grounded in these insights, we then offer our perspective on the most pressing needs, significant open problems, and possible next steps.

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات الفيزياء الطبية

OncoNet: Weakly Supervised Siamese Network to automate cancer treatment response assessment between longitudinal FDG PET/CT examinations

135 - Anirudh Joshi , Sabri Eyuboglu , Shih-Cheng Huang 2021

FDG PET/CT imaging is a resource intensive examination critical for managing malignant disease and is particularly important for longitudinal assessment during therapy. Approaches to automate longtudinal analysis present many challenges including lac k of available longitudinal datasets, managing complex large multimodal imaging examinations, and need for detailed annotations for traditional supervised machine learning. In this work we develop OncoNet, novel machine learning algorithm that assesses treatment response from a 1,954 pairs of sequential FDG PET/CT exams through weak supervision using the standard uptake values (SUVmax) in associated radiology reports. OncoNet demonstrates an AUROC of 0.86 and 0.84 on internal and external institution test sets respectively for determination of change between scans while also showing strong agreement to clinical scoring systems with a kappa score of 0.8. We also curated a dataset of 1,954 paired FDG PET/CT exams designed for response assessment for the broader machine learning in healthcare research community. Automated assessment of radiographic response from FDG PET/CT with OncoNet could provide clinicians with a valuable tool to rapidly and consistently interpret change over time in longitudinal multi-modal imaging exams.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Assessment of Ammonia as a Biosignature Gas in Exoplanet Atmospheres

262 - Jingcheng Huang , Sara Seager , Janusz J. Petkowski 2021

Ammonia (NH3) in a terrestrial planet atmosphere is generally a good biosignature gas, primarily because terrestrial planets have no significant known abiotic NH3 source. The conditions required for NH3 to accumulate in the atmosphere are, however, s tringent. NH3s high water solubility and high bio-useability likely prevent NH3 from accumulating in the atmosphere to detectable levels unless life is a net source of NH3 and produces enough NH3 to saturate the surface sinks. Only then can NH3 accumulate in the atmosphere with a reasonable surface production flux. For the highly favorable planetary scenario of terrestrial planets with H2-dominated atmospheres orbiting M dwarf stars (M5V), we find a minimum of about 5 ppm column-averaged mixing ratio is needed for NH3 to be detectable with JWST, considering a 10 ppm JWST systematic noise floor. When the surface is saturated with NH3 (i.e., there are no NH3-removal reactions on the surface), the required biological surface flux to reach 5 ppm is on the order of 10^10 molecules cm-2 s-1, comparable to the terrestrial biological production of CH4. However, when the surface is unsaturated with NH3, due to additional sinks present on the surface, life would have to produce NH3 at surface flux levels on the order of 10^15 molecules cm-2 s-1 (approx. 4.5x10^6 Tg year-1). This value is roughly 20,000 times greater than the biological production of NH3 on Earth and about 10,000 times greater than Earths CH4 biological production. Volatile amines have similar solubilities and reactivities to NH3 and hence share NH3s weaknesses and strengths as a biosignature. Finally, to establish NH3 as a biosignature gas, we must rule out mini-Neptunes with deep atmospheres, where temperatures and pressures are high enough for NH3s atmospheric production.

الأراضي والفيزياء الفلكية الكلية

Data Augmentation for Text Generation Without Any Augmented Data

103 - Wei Bi , Huayang Li , Jiacheng Huang 2021

Data augmentation is an effective way to improve the performance of many neural text generation models. However, current data augmentation methods need to define or choose proper data mapping functions that map the original samples into the augmented samples. In this work, we derive an objective to formulate the problem of data augmentation on text generation tasks without any use of augmented data constructed by specific mapping functions. Our proposed objective can be efficiently optimized and applied to popular loss functions on text generation tasks with a convergence rate guarantee. Experiments on five datasets of two text generation tasks show that our approach can approximate or even surpass popular data augmentation methods.

الحساب واللغة الذكاء الاصطناعي

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

183 - Fengbin Zhu , Wenqiang Lei , Youcheng Huang 2021

Hybrid data combining both tabular and textual content (e.g., financial reports) are quite pervasive in the real world. However, Question Answering (QA) over such hybrid data is largely neglected in existing research. In this work, we extract samples from real financial reports to build a new large-scale QA dataset containing both Tabular And Textual data, named TAT-QA, where numerical reasoning is usually required to infer the answer, such as addition, subtraction, multiplication, division, counting, comparison/sorting, and the compositions. We further propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text. It adopts sequence tagging to extract relevant cells from the table along with relevant spans from the text to infer their semantics, and then applies symbolic reasoning over them with a set of aggregation operators to arrive at the final answer. TAGOPachieves 58.0% inF1, which is an 11.1% absolute increase over the previous best baseline model, according to our experiments on TAT-QA. But this result still lags far behind performance of expert human, i.e.90.8% in F1. It is demonstrated that our TAT-QA is very challenging and can serve as a benchmark for training and testing powerful QA models that address hybrid form data.

الحساب واللغة الذكاء الاصطناعي

Nanoscale Non-Destructive Ferroelectric Characterization with Non-Contact Heterodyne Electrostrain Force Microscopy

117 - Qibin Zeng , Qicheng Huang , Hongli Wang 2021

Perceiving nanoscale ferroelectric phenomena from real space is of great importance for elucidating underlying ferroelectric physics. During the past decades, nanoscale ferroelectric characterization has mainly relied on the Piezoresponse Force Micro scopy (PFM), however, the fundamental limitations of PFM have made the nanoscale ferroelectric studies encounter significant bottlenecks. In this study, a high-resolution non-contact ferroelectric measurement, named Non-Contact Heterodyne Electrostrain Force Microscopy (NC-HEsFM), has been introduced firstly. It has been unambiguously demonstrated that NC-HEsFM can operate on multiple eigenmodes to perform ideal high-resolution ferroelectric domain mapping, standard ferroelectric hysteresis loop measurement and controllable domain manipulation. With using quartz tuning fork (QTF) sensor and heterodyne detection, NC-HEsFM shows an unprecedented capability in achieving real non-contact yet non-destructive ferroelectric characterization with negligible electrostatic force effect. It is believed that NC-HEsFM can be extensively used in various ferroelectric or piezoelectric studies with providing substantially improved characterization performance. Meanwhile, the QTF-based force detection makes NC-HEsFM highly compatible for high-vacuum and low-temperature environments, providing ideal conditions for achieving an ultra-high spatial resolution to investigate the most intrinsic ferroelectric phenomena.

علم المواد الفيزياء التطبيقية

Optimal Transmit Beamforming for Integrated Sensing and Communication

116 - Haocheng Hua , Jie Xu , Tony Xiao Han 2021

This paper studies the transmit beamforming in a downlink integrated sensing and communication (ISAC) system, where a base station (BS) equipped with a uniform linear array (ULA) sends combined information-bearing and dedicated radar signals to simul taneously perform downlink multiuser communication and radar target sensing. Under this setup, we maximize the radar sensing performance (in terms of minimizing the beampattern matching errors or maximizing the minimum beampattern gains), subject to the communication users minimum signal-to-interference-plus-noise ratio (SINR) requirements and the BSs transmit power constraints. In particular, we consider two types of communication receivers, namely Type-I and Type-II receivers, which do not have and do have the capability of cancelling the interference from the {emph{a-priori}} known dedicated radar signals, respectively. Under both Type-I and Type-II receivers, the beampattern matching and minimum beampattern gain maximization problems are globally optimally solved via applying the semidefinite relaxation (SDR) technique together with the rigorous proof of the tightness of SDR for both Type-I and Type-II receivers under the two design criteria. It is shown that at the optimality, dedicated radar signals are not required with Type-I receivers under some specific conditions, while dedicated radar signals are always needed to enhance the performance with Type-II receivers. Numerical results show that the minimum beampattern gain maximization leads to significantly higher beampattern gains at the worst-case sensing angles with a much lower computational complexity than the beampattern matching design. It is also shown that by exploiting the capability of canceling the interference caused by the radar signals, the case with Type-II receivers results in better sensing performance than that with Type-I receivers and other conventional designs.

نظرية المعلومات نظرية المعلومات

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

153 - Zhicheng Huang , Zhaoyang Zeng , Yupan Huang 2021

We study joint learning of Convolutional Neural Network (CNN) and Transformer for vision-language pre-training (VLPT) which aims to learn cross-modal alignments from millions of image-text pairs. State-of-the-art approaches extract salient image regi ons and align regions with words step-by-step. As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages. In this paper, we propose SOHO to See Out of tHe bOx that takes a whole image as input, and learns vision-language representation in an end-to-end manner. SOHO does not require bounding box annotations which enables inference 10 times faster than region-based approaches. In particular, SOHO learns to extract comprehensive yet compact image features through a visual dictionary (VD) that facilitates cross-modal understanding. VD is designed to represent consistent visual abstractions of similar semantics. It is updated on-the-fly and utilized in our proposed pre-training task Masked Visual Modeling (MVM). We conduct experiments on four well-established vision-language tasks by following standard VLPT settings. In particular, SOHO achieves absolute gains of 2.0% R@1 score on MSCOCO text retrieval 5k test split, 1.5% accuracy on NLVR$^2$ test-P split, 6.7% accuracy on SNLI-VE test split, respectively.

الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد