ﻻ يوجد ملخص باللغة العربية
Word frequency is assumed to correlate with word familiarity, but the strength of this correlation has not been thoroughly investigated. In this paper, we report on our analysis of the correlation between a word familiarity rating list obtained through a psycholinguistic experiment and the log-frequency obtained from various corpora of different kinds and sizes (up to the terabyte scale) for English and Japanese. Major findings are threefold: First, for a given corpus, familiarity is necessary for a word to achieve high frequency, but familiar words are not necessarily frequent. Second, correlation increases with the corpus data size. Third, a corpus of spoken language correlates better than one of written language. These findings suggest that cognitive familiarity ratings are correlated to frequency, but more highly to that of spoken rather than written language.
With such increasing popularity and availability of digital text data, authorships of digital texts can not be taken for granted due to the ease of copying and parsing. This paper presents a new text style analysis called natural frequency zoned word
Traditional linguistic theories have largely regard language as a formal system composed of rigid rules. However, their failures in processing real language, the recent successes in statistical natural language processing, and the findings of many ps
When one is presented with an item or a face, one can sometimes have a sense of recognition without being able to recall where or when one has encountered it before. This sense of recognition is known as familiarity. Following previous computational
There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960
Word meaning has different aspects, while the existing word representation compresses these aspects into a single vector, and it needs further analysis to recover the information in different dimensions. Inspired by quantum probability, we represent