No Arabic abstract
Biomedical researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms. We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a new recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data. Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies. It also can be customized to fit the needs of different scenarios. Ontology Recommender 2.0 combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability and usefulness. Ontology Recommender 2.0 recommends over 500 biomedical ontologies from the NCBO BioPortal platform, where it is openly available.
Populating ontology graphs represents a long-standing problem for the Semantic Web community. Recent advances in translation-based graph embedding methods for populating instance-level knowledge graphs lead to promising new approaching for the ontology population problem. However, unlike instance-level graphs, the majority of relation facts in ontology graphs come with comprehensive semantic relations, which often include the properties of transitivity and symmetry, as well as hierarchical relations. These comprehensive relations are often too complex for existing graph embedding methods, and direct application of such methods is not feasible. Hence, we propose On2Vec, a novel translation-based graph embedding method for ontology population. On2Vec integrates two model components that effectively characterize comprehensive relation facts in ontology graphs. The first is the Component-specific Model that encodes concepts and relations into low-dimensional embedding spaces without a loss of relational properties; the second is the Hierarchy Model that performs focused learning of hierarchical relation facts. Experiments on several well-known ontology graphs demonstrate the promising capabilities of On2Vec in predicting and verifying new relation facts. These promising results also make possible significant improvements in related methods.
Slot filling is a fundamental task in dialog state tracking in task-oriented dialog systems. In multi-domain task-oriented dialog system, user utterances and system responses may mention multiple named entities and attributes values. A system needs to select those that are confirmed by the user and fill them into destined slots. One difficulty is that since a dialogue session contains multiple system-user turns, feeding in all the tokens into a deep model such as BERT can be challenging due to limited capacity of input word tokens and GPU memory. In this paper, we investigate an ontology-enhanced approach by matching the named entities occurred in all dialogue turns using ontology. The matched entities in the previous dialogue turns will be accumulated and encoded as additional inputs to a BERT-based dialogue state tracker. In addition, our improvement includes ontology constraint checking and the correction of slot name tokenization. Experimental results showed that our ontology-enhanced dialogue state tracker improves the joint goal accuracy (slot F1) from 52.63% (91.64%) to 53.91% (92%) on MultiWOZ 2.1 corpus.
Considering the high heterogeneity of the ontologies pub-lished on the web, ontology matching is a crucial issue whose aim is to establish links between an entity of a source ontology and one or several entities from a target ontology. Perfectible similarity measures, consid-ered as sources of information, are combined to establish these links. The theory of belief functions is a powerful mathematical tool for combining such uncertain information. In this paper, we introduce a decision pro-cess based on a distance measure to identify the best possible matching entities for a given source entity.
Increased availability of electronic health records (EHR) has enabled researchers to study various medical questions. Cohort selection for the hypothesis under investigation is one of the main consideration for EHR analysis. For uncommon diseases, cohorts extracted from EHRs contain very limited number of records - hampering the robustness of any analysis. Data augmentation methods have been successfully applied in other domains to address this issue mainly using simulated records. In this paper, we present ODVICE, a data augmentation framework that leverages the medical concept ontology to systematically augment records using a novel ontologically guided Monte-Carlo graph spanning algorithm. The tool allows end users to specify a small set of interactive controls to control the augmentation process. We analyze the importance of ODVICE by conducting studies on MIMIC-III dataset for two learning tasks. Our results demonstrate the predictive performance of ODVICE augmented cohorts, showing ~30% improvement in area under the curve (AUC) over the non-augmented dataset and other data augmentation strategies.
Provenance is a critical ingredient for establishing trust of published scientific content. This is true whether we are considering a data set, a computational workflow, a peer-reviewed publication or a simple scientific claim with supportive evidence. Existing vocabularies such as DC Terms and the W3C PROV-O are domain-independent and general-purpose and they allow and encourage for extensions to cover more specific needs. We identify the specific need for identifying or distinguishing between the various roles assumed by agents manipulating digital artifacts, such as author, contributor and curator. We present the Provenance, Authoring and Versioning ontology (PAV): a lightweight ontology for capturing just enough descriptions essential for tracking the provenance, authoring and versioning of web resources. We argue that such descriptions are essential for digital scientific content. PAV distinguishes between contributors, authors and curators of content and creators of representations in addition to the provenance of originating resources that have been accessed, transformed and consumed. We explore five projects (and communities) that have adopted PAV illustrating their usage through concrete examples. Moreover, we present mappings that show how PAV extends the PROV-O ontology to support broader interoperability. The authors strived to keep PAV lightweight and compact by including only those terms that have demonstrated to be pragmatically useful in existing applications, and by recommending terms from existing ontologies when plausible. We analyze and compare PAV with related approaches, namely Provenance Vocabulary, DC Terms and BIBFRAME. We identify similarities and analyze their differences with PAV, outlining strengths and weaknesses of our proposed model. We specify SKOS mappings that align PAV with DC Terms.