ﻻ يوجد ملخص باللغة العربية
With such increasing popularity and availability of digital text data, authorships of digital texts can not be taken for granted due to the ease of copying and parsing. This paper presents a new text style analysis called natural frequency zoned word distribution analysis (NFZ-WDA), and then a basic authorship attribution scheme and an open authorship attribution scheme for digital texts based on the analysis. NFZ-WDA is based on the observation that all authors leave distinct intrinsic word usage traces on texts written by them and these intrinsic styles can be identified and employed to analyze the authorship. The intrinsic word usage styles can be estimated through the analysis of word distribution within a text, which is more than normal word frequency analysis and can be expressed as: which groups of words are used in the text; how frequently does each group of words occur; how are the occurrences of each group of words distributed in the text. Next, the basic authorship attribution scheme and the open authorship attribution scheme provide solutions for both closed and open authorship attribution problems. Through analysis and extensive experimental studies, this paper demonstrates the efficiency of the proposed method for authorship attribution.
Word frequency is assumed to correlate with word familiarity, but the strength of this correlation has not been thoroughly investigated. In this paper, we report on our analysis of the correlation between a word familiarity rating list obtained throu
The distribution of frequency counts of distinct words by length in a languages vocabulary will be analyzed using two methods. The first, will look at the empirical distributions of several languages and derive a distribution that reasonably explains
Robustness against word substitutions has a well-defined and widely acceptable form, i.e., using semantically similar words as substitutions, and thus it is considered as a fundamental stepping-stone towards broader robustness in natural language pro
We propose an unsupervised neural model for learning a discrete embedding of words. Unlike existing discrete embeddings, our binary embedding supports vector arithmetic operations similar to continuous embeddings. Our embedding represents each word a
Co-occurrence statistics based word embedding techniques have proved to be very useful in extracting the semantic and syntactic representation of words as low dimensional continuous vectors. In this work, we discovered that dictionary learning can op