تحديد العلاقات بين المؤلفين بين المؤلفين ذات أهمية مركزية لدراسة الأدبيات. نقوم بالإبلاغ عن تحليل تجريبي بين التقاطعات التعليمية في الأدبيات اللاتينية الكلاسيكية باستخدام نماذج تضمين كلمة. لتمكين التقييم الكمي لطرق البحث Intertextuxucture، نرفع مجموعة بيانات جديدة من 945 موازية معروفة تم رسمها من المنحة التقليدية على الشعر الملحمي اللاتيني. نقوم بتدريب نموذج Word2VEC الأمثل على كائن كبير من اللاتينية Lemmatized، والذي يحقق أداء حديثة للكشف عن المرادف والتفوق بطريقة معجمية تستخدم على نطاق واسع للبحث Intertextual. ثم نوضح بعد ذلك أن تضمينات التدريب في كورسيا الصغيرة جدا يمكن أن تلتقط الجوانب البارزة للأسلوب الأدبي وتطبيق هذا النهج على تكرار دراسة Intertextual السابقة ل Livy المؤرخ الروماني، والتي اعتمدت على ميزات أنالومترية يدوية باليد. تقدم نتائجنا تطوير الموارد الحسابية الأساسية لغلق رئيسي رئيسي وتسليط الضوء على شارع إنتاجي للتعاون متعدد التخصصات بين دراسة الأدب و NLP.
Identifying intertextual relationships between authors is of central importance to the study of literature. We report an empirical analysis of intertextuality in classical Latin literature using word embedding models. To enable quantitative evaluation of intertextual search methods, we curate a new dataset of 945 known parallels drawn from traditional scholarship on Latin epic poetry. We train an optimized word2vec model on a large corpus of lemmatized Latin, which achieves state-of-the-art performance for synonym detection and outperforms a widely used lexical method for intertextual search. We then demonstrate that training embeddings on very small corpora can capture salient aspects of literary style and apply this approach to replicate a previous intertextual study of the Roman historian Livy, which relied on hand-crafted stylometric features. Our results advance the development of core computational resources for a major premodern language and highlight a productive avenue for cross-disciplinary collaboration between the study of literature and NLP.
References used
https://aclanthology.org/
Detecting lexical semantic change in smaller data sets, e.g. in historical linguistics and digital humanities, is challenging due to a lack of statistical power. This issue is exacerbated by non-contextual embedding models that produce one embedding
We present Query2Prod2Vec, a model that grounds lexical representations for product search in product embeddings: in our model, meaning is a mapping between words and a latent space of products in a digital shop. We leverage shopping sessions to lear
Word Embedding maps words to vectors of real numbers. It is derived from a large corpus and is known to capture semantic knowledge from the corpus. Word Embedding is a critical component of many state-of-the-art Deep Learning techniques. However, gen
While the production of information in the European early modern period is a well-researched topic, the question how people were engaging with the information explosion that occurred in early modern Europe, is still underexposed. This paper presents
We introduce a new approach for smoothing and improving the quality of word embeddings. We consider a method of fusing word embeddings that were trained on the same corpus but with different initializations. We project all the models to a shared vect