Incorporating domain knowledge into neural-guided search

70 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Brenden Petersen

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Brenden K. Petersen - Claudio P. Santiago - Mikel Landajuela Larma

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Many AutoML problems involve optimizing discrete objects under a black-box reward. Neural-guided search provides a flexible means of searching these combinatorial spaces using an autoregressive recurrent neural network. A major benefit of this approach is that builds up objects sequentially--this provides an opportunity to incorporate domain knowledge into the search by directly modifying the logits emitted during sampling. In this work, we formalize a framework for incorporating such in situ priors and constraints into neural-guided search, and provide sufficient conditions for enforcing constraints. We integrate several priors and constraints from existing works into this framework, propose several new ones, and demonstrate their efficacy in informing the task of symbolic regression.

قيم البحث

491 - Diego Calvanese , Tomer Kotek , Mantas v{S}imkus 2013

The verification community has studied dynamic data structures primarily in a bottom-up way by analyzing pointers and the shapes induced by them. Recent work in fields such as separation logic has made significant progress in extracting shapes from p rogram source code. Many real world programs however manipulate complex data whose structure and content is most naturally described by formalisms from object oriented programming and databases. In this paper, we look at the verification of programs with dynamic data structures from the perspective of content representation. Our approach is based on description logic, a widely used knowledge representation paradigm which gives a logical underpinning for diverse modeling frameworks such as UML and ER. Technically, we assume that we have separation logic shape invariants obtained from a shape analysis tool, and requirements on the program data in terms of description logic. We show that the two-variable fragment of first order logic with counting and trees %(whose decidability was proved at LICS 2013) can be used as a joint framework to embed suitable fragments of description logic and separation logic.

لغات البرمجة

A Survey on Incorporating Domain Knowledge into Deep Learning for Medical Image Analysis

97 - Xiaozheng Xie , Jianwei Niu , Xuefeng Liu 2020

Although deep learning models like CNNs have achieved great success in medical image analysis, the small size of medical datasets remains a major bottleneck in this area. To address this problem, researchers have started looking for external informat ion beyond current available medical datasets. Traditional approaches generally leverage the information from natural images via transfer learning. More recent works utilize the domain knowledge from medical doctors, to create networks that resemble how medical doctors are trained, mimic their diagnostic patterns, or focus on the features or areas they pay particular attention to. In this survey, we summarize the current progress on integrating medical domain knowledge into deep learning models for various tasks, such as disease diagnosis, lesion, organ and abnormality detection, lesion and organ segmentation. For each task, we systematically categorize different kinds of medical domain knowledge that have been utilized and their corresponding integrating methods. We also provide current challenges and directions for future research.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Data Hunches: Incorporating Personal Knowledge into Visualizations

250 - Haihan Lin , Derya Akbaba , Miriah Meyer 2021

The trouble with data is that often it provides only an imperfect representation of the phenomenon of interest. When reading and interpreting data, personal knowledge about the data plays an important role. Data visualization, however, has neither a concept defining personal knowledge about datasets, nor the methods or tools to robustly integrate them into an analysis process, thus hampering analysts ability to express their personal knowledge about datasets, and others to learn from such knowledge. In this work, we define such personal knowledge about datasets as data hunches and elevate this knowledge to another form of data that can be externalized, visualized, and used for collaboration. We establish the implications of data hunches and provide a design space for externalizing and communicating data hunches through visualization techniques. We envision such a design space will empower users to externalize their personal knowledge and support the ability to learn from others data hunches.

تفاعل الإنسان والحاسوب

TravelBERT: Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation

179 - Hongyin Zhu , Hao Peng , Zhiheng Lyu 2021

Existing technologies expand BERT from different perspectives, e.g. designing different pre-training tasks, different semantic granularities and different model architectures. Few models consider expanding BERT from different text formats. In this pa per, we propose a heterogeneous knowledge language model (HKLM), a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text and well-structured text. To capture the corresponding relations among these multi-format knowledge, our approach uses masked language model objective to learn word knowledge, uses triple classification objective and title matching objective to learn entity knowledge and topic knowledge respectively. To obtain the aforementioned multi-format text, we construct a corpus in the tourism domain and conduct experiments on 5 tourism NLP datasets. The results show that our approach outperforms the pre-training of plain text using only 1/4 of the data. The code, datasets, corpus and knowledge graph will be released.

الحساب واللغة

Distilling Wikipedia mathematical knowledge into neural network models

85 - Joanne T. Kim , Mikel Landajuela Larma , Brenden K. Petersen 2021

Machine learning applications to symbolic mathematics are becoming increasingly popular, yet there lacks a centralized source of real-world symbolic expressions to be used as training data. In contrast, the field of natural language processing levera ges resources like Wikipedia that provide enormous amounts of real-world textual data. Adopting the philosophy of mathematics as language, we bridge this gap by introducing a pipeline for distilling mathematical expressions embedded in Wikipedia into symbolic encodings to be used in downstream machine learning tasks. We demonstrate that a $textit{mathematical}$ $textit{language}$ $textit{model}$ trained on this corpus of expressions can be used as a prior to improve the performance of neural-guided search for the task of symbolic regression.

التعلم الآلي الذكاء الاصطناعي