مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Using compression to identify acronyms in text

81 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Stuart Yeates

تاريخ النشر 2000

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Stuart Yeates - David Bainbridge - Ian H. Witten

المكتبات الرقمية استرجاع المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Text mining is about looking for patterns in natural language text, and may be defined as the process of analyzing text to extract information from it for particular purposes. In previous work, we claimed that compression is a key technology for text mining, and backed this up with a study that showed how particular kinds of lexical tokens---names, dates, locations, etc.---can be identified and located in running text, using compression models to provide the leverage necessary to distinguish different token types (Witten et al., 1999)

قيم البحث

135 - Yuzhuo Wang , Chengzhi Zhang 2020

In the era of big data, the advancement, improvement, and application of algorithms in academic research have played an important role in promoting the development of different disciplines. Academic papers in various disciplines, especially computer science, contain a large number of algorithms. Identifying the algorithms from the full-text content of papers can determine popular or classical algorithms in a specific field and help scholars gain a comprehensive understanding of the algorithms and even the field. To this end, this article takes the field of natural language processing (NLP) as an example and identifies algorithms from academic papers in the field. A dictionary of algorithms is constructed by manually annotating the contents of papers, and sentences containing algorithms in the dictionary are extracted through dictionary-based matching. The number of articles mentioning an algorithm is used as an indicator to analyze the influence of that algorithm. Our results reveal the algorithm with the highest influence in NLP papers and show that classification algorithms represent the largest proportion among the high-impact algorithms. In addition, the evolution of the influence of algorithms reflects the changes in research tasks and topics in the field, and the changes in the influence of different algorithms show different trends. As a preliminary exploration, this paper conducts an analysis of the impact of algorithms mentioned in the academic text, and the results can be used as training data for the automatic extraction of large-scale algorithms in the future. The methodology in this paper is domain-independent and can be applied to other domains.

الحساب واللغة استرجاع المعلومات التعلم الآلي

Using Full-text Content of Academic Articles to Build a Methodology Taxonomy of Information Science in China

110 - Heng Zhang , Chengzhi Zhang 2021

Research on the construction of traditional information science methodology taxonomy is mostly conducted manually. From the limited corpus, researchers have attempted to summarize some of the research methodology entities into several abstract levels (generally three levels); however, they have been unable to provide a more granular hierarchy. Moreover, updating the methodology taxonomy is traditionally a slow process. In this study, we collected full-text academic papers related to information science. First, we constructed a basic methodology taxonomy with three levels by manual annotation. Then, the word vectors of the research methodology entities were trained using the full-text data. Accordingly, the research methodology entities were clustered and the basic methodology taxonomy was expanded using the clustering results to obtain a methodology taxonomy with more levels. This study provides new concepts for constructing a methodology taxonomy of information science. The proposed methodology taxonomy is semi-automated; it is more detailed than conventional schemes and the speed of taxonomy renewal has been enhanced.

المكتبات الرقمية الحساب واللغة

A Computational Approach to Historical Ontologies

361 - Mat Kelly 2020

This paper presents a use case exploring the application of the Archival Resource Key (ARK) persistent identifier for promoting and maintaining ontologies. In particular, we look at improving computation with an in-house ontology server in the contex t of temporally aligned vocabularies. This effort demonstrates the utility of ARKs in preparing historical ontologies for computational archival science.

المكتبات الرقمية استرجاع المعلومات

Need to categorize: A comparative look at the categories of the Universal Decimal Classification system (UDC) and Wikipedia

95 - Almila Akdag Salah , Cheng Gao , Krzysztof Suchecki 2011

This study analyzes the differences between the category structure of the Universal Decimal Classification (UDC) system (which is one of the widely used library classification systems in Europe) and Wikipedia. In particular, we compare the emerging s tructure of category-links to the structure of classes in the UDC. With this comparison we would like to scrutinize the question of how do knowledge maps of the same domain differ when they are created socially (i.e. Wikipedia) as opposed to when they are created formally (UDC) using classificatio theory. As a case study, we focus on the category of Arts.

المكتبات الرقمية استرجاع المعلومات الفيزياء والمجتمع

Finding Quality Issues in SKOS Vocabularies

184 - Christian Mader , Bernhard Haslhofer , Antoine Isaac 2012

The Simple Knowledge Organization System (SKOS) is a standard model for controlled vocabularies on the Web. However, SKOS vocabularies often differ in terms of quality, which reduces their applicability across system boundaries. Here we investigate h ow we can support taxonomists in improving SKOS vocabularies by pointing out quality issues that go beyond the integrity constraints defined in the SKOS specification. We identified potential quantifiable quality issues and formalized them into computable quality checking functions that can find affected resources in a given SKOS vocabulary. We implemented these functions in the qSKOS quality assessment tool, analyzed 15 existing vocabularies, and found possible quality issues in all of them.

المكتبات الرقمية استرجاع المعلومات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الشرق الأوسط - الأردن

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Using compression to identify acronyms in text

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً