Do you want to publish a course? Click here

Towards a system for constructing Arabic Ontology based on natural text

بناء نواة نظام مساعد على إنشاء أنطولوجية عربية انطلاقاً من النصوص

1986   0   55   0 ( 0 )
 Publication date 2011
and research's language is العربية
 Created by Shamra Editor




Ask ChatGPT about the research

This paper presents ArOntoLearn, a Framework for Arabic Ontology learning from textual resources. Supporting Arabic language and using domain knowledge in the learning process are the main features of our framework. Besides it represents the learned ontology in Probabilistic Ontology Model (POM), which can be translated into any knowledge representation formalism, and implements data-driven change discovery. Therefore it updates the POM according to the corpus changes only, and allows user to trace the evolution of the ontology with respect to the changes in the underlying corpus. Our framework analyses Arabic textual resources, and matches them to Arabic Lexico-syntactic patterns in order to learn new Concepts and Relations. Supporting Arabic language is not that easy task, because current linguistic analysis tools are not efficient enough to process unvocalized Arabic corpuses that rarely contain appropriate punctuation. So we tried to build a flexible and freely configured framework whereas any linguistic analysis tool can be replaced by more sophisticated one whenever it is available.



References used
John.son, C., Fillmore, C., Petruck, M. Baker, C., Ellsworth, M., Ruppenhofer, J., and Wood, E. 2002. FrameNet: Theory and Practice, from http://www.icsi.Berkeley.edu / framenet
Josef Ruppenhofer, MichaelEllsworth, Miriam R. L. Petruck, Christopher R. Johnson, Jan Scheffczyk. "Frame Net II :Extended Theory and Practice", 2006
WordNet. Retrieved June 2009, from http//:www.globalwordnet.org
rate research

Read More

Biomaterials are synthetic or natural materials used for constructing artificial organs, fabricating prostheses, or replacing tissues. The last century saw the development of thousands of novel biomaterials and, as a result, an exponential increase i n scientific publications in the field. Large-scale analysis of biomaterials and their performance could enable data-driven material selection and implant design. However, such analysis requires identification and organization of concepts, such as materials and structures, from published texts. To facilitate future information extraction and the application of machine-learning techniques, we developed a semantic annotator specifically tailored for the biomaterials literature. The Biomaterials Annotator has been implemented following a modular organization using software containers for the different components and orchestrated using Nextflow as workflow manager. Natural language processing (NLP) components are mainly developed in Java. This set-up has allowed named entity recognition of seventeen classes relevant to the biomaterials domain. Here we detail the development, evaluation and performance of the system, as well as the release of the first collection of annotated biomaterials abstracts. We make both the corpus and system available to the community to promote future efforts in the field and contribute towards its sustainability.
Recent question answering and machine reading benchmarks frequently reduce the task to one of pinpointing spans within a certain text passage that answers the given question. Typically, these systems are not required to actually understand the text o n a deeper level that allows for more complex reasoning on the information contained. We introduce a new dataset called BiQuAD that requires deeper comprehension in order to answer questions in both extractive and deductive fashion. The dataset consist of 4,190 closed-domain texts and a total of 99,149 question-answer pairs. The texts are synthetically generated soccer match reports that verbalize the main events of each match. All texts are accompanied by a structured Datalog program that represents a (logical) model of its information. We show that state-of-the-art QA models do not perform well on the challenging long form contexts and reasoning requirements posed by the dataset. In particular, transformer based state-of-the-art models achieve F1-scores of only 39.0. We demonstrate how these synthetic datasets align structured knowledge with natural text and aid model introspection when approaching complex text understanding.
Psychometric measures of ability, attitudes, perceptions, and beliefs are crucial for understanding user behavior in various contexts including health, security, e-commerce, and finance. Traditionally, psychometric dimensions have been measured and c ollected using survey-based methods. Inferring such constructs from user-generated text could allow timely, unobtrusive collection and analysis. In this paper we describe our efforts to construct a corpus for psychometric natural language processing (NLP) related to important dimensions such as trust, anxiety, numeracy, and literacy, in the health domain. We discuss our multi-step process to align user text with their survey-based response items and provide an overview of the resulting testbed which encompasses survey-based psychometric measures and accompanying user-generated text from 8,502 respondents. Our testbed also encompasses self-reported demographic information, including race, sex, age, income, and education - thereby affording opportunities for measuring bias and benchmarking fairness of text classification methods. We report preliminary results on use of the text to predict/categorize users' survey response labels - and on the fairness of these models. We also discuss the important implications of our work and resulting testbed for future NLP research on psychometrics and fairness.
Most of the time, when dealing with a particular Natural Language Processing task, systems are compared on the basis of global statistics such as recall, precision, F1-score, etc. While such scores provide a general idea of the behavior of these syst ems, they ignore a key piece of information that can be useful for assessing progress and discerning remaining challenges: the relative difficulty of test instances. To address this shortcoming, we introduce the notion of differential evaluation which effectively defines a pragmatic partition of instances into gradually more difficult bins by leveraging the predictions made by a set of systems. Comparing systems along these difficulty bins enables us to produce a finer-grained analysis of their relative merits, which we illustrate on two use-cases: a comparison of systems participating in a multi-label text classification task (CLEF eHealth 2018 ICD-10 coding), and a comparison of neural models trained for biomedical entity detection (BioCreative V chemical-disease relations dataset).
The ability to search the Web sites has become essential for many people. However many sites have problems in giving the user the needed information. Search operations are typically limited to keyword searches and do not take into consideration the u nderlying semantics of the content.The present technologies support most languages; Though Arabic is still not well supported. One of the main application areas of Ontology technology is semantics. Although there are many tools for developing Ontology’s in many languages, Arabic WordNet seems to be the only one that supports Arabic language. In this paper we will define the necessary steps to develop Arabic Ontology for university sites using Arabic WordNet, and check that the developed Ontology is clean.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا