إن التنبؤ بصعوبة المفردات الخاصة بالمجال هي مهمة مهمة نحو فهم أفضل للنطاق، وتعزيز التواصل بين الأشخاص الخبراء والخبراء.نقوم بالتحقيق في مركبات الأسماء المغلقة الألمانية والتركيز على تفاعل الميزات المعجمية القائمة على المركب (مثل التردد والإنتاجية) والميزات المستندة إلى المصطلحات (المتناقضة لغة خاصة بالمجال واللغة العامة) عبر تمثيلات الكلمات والصفوفات المصنفة.تكمل تجارب التنبؤ لدينا رؤى من التصنيف باستخدام (أ) ميزات مصممة يدويا لتوصيف الوالدين وتشكيل المركب و (ب) مجمعات Word Adgentdings.نجد أنه بالنسبة للتمييز الثنائي الواسع في التردد المركزي باللغة العامة "VS. الصعب الصعب" كافية، ولكن بالنسبة للتمييز الأكثر غرامة من أربعة فئات من الدرجة الأولى، فمن الأهمية بمكان تضمين ميزات الحد من الناحية المتعاوية والمركب والميزات المكونة.
Predicting the difficulty of domain-specific vocabulary is an important task towards a better understanding of a domain, and to enhance the communication between lay people and experts. We investigate German closed noun compounds and focus on the interaction of compound-based lexical features (such as frequency and productivity) and terminology-based features (contrasting domain-specific and general language) across word representations and classifiers. Our prediction experiments complement insights from classification using (a) manually designed features to characterise termhood and compound formation and (b) compound and constituent word embeddings. We find that for a broad binary distinction into easy' vs. difficult' general-language compound frequency is sufficient, but for a more fine-grained four-class distinction it is crucial to include contrastive termhood features and compound and constituent features.
References used
Adjectives such as heavy (as in heavy rain) and windy (as in windy day) provide possible values for the attributes intensity and climate, respectively. The attributes themselves are not overtly realized and are in this sense implicit. While these att
Word embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages. To gain further insight into word embeddings, we explore their stability (e.g., overlap b
Building automatic technical support system is an important yet challenge task. Conceptually, to answer a user question on a technical forum, a human expert has to first retrieve relevant documents, and then read them carefully to identify the answer
Masked language models have quickly become the de facto standard when processing text. Recently, several approaches have been proposed to further enrich word representations with external knowledge sources such as knowledge graphs. However, these mod
Term weighting schemes are widely used in Natural Language Processing and Information Retrieval. In particular, term weighting is the basis for keyword extraction. However, there are relatively few evaluation studies that shed light about the strengt