بينما توجد عشرات من اللغات الطبيعية، ولكل منها ميزاتها الفريدة والخصوصيات، فإنهم جميعهم يشتركون موضوع موحد: تمكين التواصل البشري.قد نتوقع ذلك بشكل معقول أن أشكال الإدراك البشرية كيف تتطور هذه اللغات وتستخدم.على افتراض أن القدرة على معالجة المعلومات ثابتة تقريبا عبر السكان البشري، نتوقع أن تنظر إلى مفاضلة مفاجأة مدةنقوم بتحليل هذه المفاضلة باستخدام Corpus من 600 لغة، وبعد التحكم في العديد من الارتباطات المحتملة، نجد أدلة داعمة قوية في كلا الإعدادتين.على وجه التحديد، نجد أنه في المتوسط، يتم إنتاج الهواتف أسرع بلغات حيث تكون أقل إثارة للدهشة والعكس.علاوة على ذلك، نؤكد أن الهواتف الأكثر إثارة للدهشة هي أطول، في المتوسط، في 319 لغة من أصل 600. وبالتالي نستنتج أن هناك أدلة قوية على مفاضلة مفاجأة مدة العمل في العملية، سواء بلغت لغات العالم وداخلها.
While there exist scores of natural languages, each with its unique features and idiosyncrasies, they all share a unifying theme: enabling human communication. We may thus reasonably predict that human cognition shapes how these languages evolve and are used. Assuming that the capacity to process information is roughly constant across human populations, we expect a surprisal--duration trade-off to arise both across and within languages. We analyse this trade-off using a corpus of 600 languages and, after controlling for several potential confounds, we find strong supporting evidence in both settings. Specifically, we find that, on average, phones are produced faster in languages where they are less surprising, and vice versa. Further, we confirm that more surprising phones are longer, on average, in 319 languages out of the 600. We thus conclude that there is strong evidence of a surprisal--duration trade-off in operation, both across and within the world's languages.
References used
https://aclanthology.org/
How would you explain Bill Gates to a German? He is associated with founding a company in the United States, so perhaps the German founder Carl Benz could stand in for Gates in those contexts. This type of translation is called adaptation in the tran
Capturing word meaning in context and distinguishing between correspondences and variations across languages is key to building successful multilingual and cross-lingual text representation models. However, existing multilingual evaluation datasets t
State-of-the-art multilingual systems rely on shared vocabularies that sufficiently cover all considered languages. To this end, a simple and frequently used approach makes use of subword vocabularies constructed jointly over several languages. We hy
Masked language models have quickly become the de facto standard when processing text. Recently, several approaches have been proposed to further enrich word representations with external knowledge sources such as knowledge graphs. However, these mod
While emotions are universal aspects of human psychology, they are expressed differently across different languages and cultures. We introduce a new data set of over 530k anonymized public Facebook posts across 18 languages, labeled with five differe