No Arabic abstract
In this paper we argue that (lexical) meaning in science can be represented in a 13 dimension Meaning Space. This space is constructed using principal component analysis (singular decomposition) on the matrix of word category relative information gains, where the categories are those used by the Web of Science, and the words are taken from a reduced word set from texts in the Web of Science. We show that this reduced word set plausibly represents all texts in the corpus, so that the principal component analysis has some objective meaning with respect to the corpus. We argue that 13 dimensions is adequate to describe the meaning of scientific texts, and hypothesise about the qualitative meaning of the principal components.
Based on the classical Degree Corrected Stochastic Blockmodel (DCSBM) model for network community detection problem, we propose two novel approaches: principal component clustering (PCC) and normalized principal component clustering (NPCC). Without any parameters to be estimated, the PCC method is simple to be implemented. Under mild conditions, we show that PCC yields consistent community detection. NPCC is designed based on the combination of the PCC and the RSC method (Qin & Rohe 2013). Population analysis for NPCC shows that NPCC returns perfect clustering for the ideal case under DCSBM. PCC and NPCC is illustrated through synthetic and real-world datasets. Numerical results show that NPCC provides a significant improvement compare with PCC and RSC. Moreover, NPCC inherits nice properties of PCC and RSC such that NPCC is insensitive to the number of eigenvectors to be clustered and the choosing of the tuning parameter. When dealing with two weak signal networks Simmons and Caltech, by considering one more eigenvectors for clustering, we provide two refinements PCC+ and NPCC+ of PCC and NPCC, respectively. Both two refinements algorithms provide improvement performances compared with their original algorithms. Especially, NPCC+ provides satisfactory performances on Simmons and Caltech, with error rates of 121/1137 and 96/590, respectively.
Principal component analysis is an important pattern recognition and dimensionality reduction tool in many applications. Principal components are computed as eigenvectors of a maximum likelihood covariance $widehat{Sigma}$ that approximates a population covariance $Sigma$, and these eigenvectors are often used to extract structural information about the variables (or attributes) of the studied population. Since PCA is based on the eigendecomposition of the proxy covariance $widehat{Sigma}$ rather than the ground-truth $Sigma$, it is important to understand the approximation error in each individual eigenvector as a function of the number of available samples. The recent results of Kolchinskii and Lounici yield such bounds. In the present paper we sharpen these bounds and show that eigenvectors can often be reconstructed to a required accuracy from a sample of strictly smaller size order.
The performance of the Self-Organizing Map (SOM) algorithm is dependent on the initial weights of the map. The different initialization methods can broadly be classified into random and data analysis based initialization approach. In this paper, the performance of random initialization (RI) approach is compared to that of principal component initialization (PCI) in which the initial map weights are chosen from the space of the principal component. Performance is evaluated by the fraction of variance unexplained (FVU). Datasets were classified into quasi-linear and non-linear and it was observed that RI performed better for non-linear datasets; however the performance of PCI approach remains inconclusive for quasi-linear datasets.
In a recent issue of Linguistics and Philosophy Kasmi and Pelletier (1998) (K&P), and Westerstahl (1998) criticize Zadroznys (1994) argument that any semantics can be represented compositionally. The argument is based upon Zadroznys theorem that every meaning function m can be encoded by a function mu such that (i) for any expression E of a specified language L, m(E) can be recovered from mu(E), and (ii) mu is a homomorphism from the syntactic structures of L to interpretations of L. In both cases, the primary motivation for the objections brought against Zadroznys argument is the view that his encoding of the original meaning function does not properly reflect the synonymy relations posited for the language. In this paper, we argue that these technical criticisms do not go through. In particular, we prove that mu properly encodes synonymy relations, i.e. if two expressions are synonymous, then their compositional meanings are identical. This corrects some misconceptions about the function mu, e.g. Janssen (1997). We suggest that the reason that semanticists have been anxious to preserve compositionality as a significant constraint on semantic theory is that it has been mistakenly regarded as a condition that must be satisfied by any theory that sustains a systematic connection between the meaning of an expression and the meanings of its parts. Recent developments in formal and computational semantics show that systematic theories of meanings need not be compositional.
We place functional constraints on the shape of the inflaton potential from the cosmic microwave background through a variant of the generalized slow roll approximation that allows large amplitude, rapidly changing deviations from scale-free conditions. Employing a principal component decomposition of the source function G~3(V/V)^2 - 2V/V and keeping only those measured to better than 10% results in 5 nearly independent Gaussian constraints that maybe used to test any single-field inflationary model where such deviations are expected. The first component implies < 3% variations at the 100 Mpc scale. One component shows a 95% CL preference for deviations around the 300 Mpc scale at the ~10% level but the global significance is reduced considering the 5 components examined. This deviation also requires a change in the cold dark matter density which in a flat LCDM model is disfavored by current supernova and Hubble constant data and can be tested with future polarization or high multipole temperature data. Its impact resembles a local running of the tilt from multipoles 30-800 but is only marginally consistent with a constant running beyond this range. For this analysis, we have implemented a ~40x faster WMAP7 likelihood method which we have made publicly available.