Principal Components of the Meaning

85 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Neslihan Suzen

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Neslihan Suzen - Alexander Gorban - Jeremy Levesley

الحساب واللغة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper we argue that (lexical) meaning in science can be represented in a 13 dimension Meaning Space. This space is constructed using principal component analysis (singular decomposition) on the matrix of word category relative information gains, where the categories are those used by the Web of Science, and the words are taken from a reduced word set from texts in the Web of Science. We show that this reduced word set plausibly represents all texts in the corpus, so that the principal component analysis has some objective meaning with respect to the corpus. We argue that 13 dimensions is adequate to describe the meaning of scientific texts, and hypothesise about the qualitative meaning of the principal components.

قيم البحث

106 - Huan Qing , Jingli Wang 2020

Based on the classical Degree Corrected Stochastic Blockmodel (DCSBM) model for network community detection problem, we propose two novel approaches: principal component clustering (PCC) and normalized principal component clustering (NPCC). Without a ny parameters to be estimated, the PCC method is simple to be implemented. Under mild conditions, we show that PCC yields consistent community detection. NPCC is designed based on the combination of the PCC and the RSC method (Qin & Rohe 2013). Population analysis for NPCC shows that NPCC returns perfect clustering for the ideal case under DCSBM. PCC and NPCC is illustrated through synthetic and real-world datasets. Numerical results show that NPCC provides a significant improvement compare with PCC and RSC. Moreover, NPCC inherits nice properties of PCC and RSC such that NPCC is insensitive to the number of eigenvectors to be clustered and the choosing of the tuning parameter. When dealing with two weak signal networks Simmons and Caltech, by considering one more eigenvectors for clustering, we provide two refinements PCC+ and NPCC+ of PCC and NPCC, respectively. Both two refinements algorithms provide improvement performances compared with their original algorithms. Especially, NPCC+ provides satisfactory performances on Simmons and Caltech, with error rates of 121/1137 and 96/590, respectively.

التعلم الالي التعلم الآلي الشبكات الاجتماعية والمعلومات

Quantifying the Estimation Error of Principal Components

332 - Raphael Hauser , Raul Kangro , Juri Lember 2017

Principal component analysis is an important pattern recognition and dimensionality reduction tool in many applications. Principal components are computed as eigenvectors of a maximum likelihood covariance $widehat{Sigma}$ that approximates a populat ion covariance $Sigma$, and these eigenvectors are often used to extract structural information about the variables (or attributes) of the studied population. Since PCA is based on the eigendecomposition of the proxy covariance $widehat{Sigma}$ rather than the ground-truth $Sigma$, it is important to understand the approximation error in each individual eigenvector as a function of the number of available samples. The recent results of Kolchinskii and Lounici yield such bounds. In the present paper we sharpen these bounds and show that eigenvectors can often be reconstructed to a required accuracy from a sample of strictly smaller size order.

نظرية الإحصاء نظرية الإحصاء

Initialization of Self-Organizing Maps: Principal Components Versus Random Initialization. A Case Study

153 - A. A. Akinduko , E. M. Mirkes 2012

The performance of the Self-Organizing Map (SOM) algorithm is dependent on the initial weights of the map. The different initialization methods can broadly be classified into random and data analysis based initialization approach. In this paper, the performance of random initialization (RI) approach is compared to that of principal component initialization (PCI) in which the initial map weights are chosen from the space of the principal component. Performance is evaluated by the fraction of variance unexplained (FVU). Datasets were classified into quasi-linear and non-linear and it was observed that RI performed better for non-linear datasets; however the performance of PCI approach remains inconclusive for quasi-linear datasets.

التعلم الالي التعلم الآلي

Compositionality, Synonymy, and the Systematic Representation of Meaning

51 - Shalom Lappin 2000

In a recent issue of Linguistics and Philosophy Kasmi and Pelletier (1998) (K&P), and Westerstahl (1998) criticize Zadroznys (1994) argument that any semantics can be represented compositionally. The argument is based upon Zadroznys theorem that ever y meaning function m can be encoded by a function mu such that (i) for any expression E of a specified language L, m(E) can be recovered from mu(E), and (ii) mu is a homomorphism from the syntactic structures of L to interpretations of L. In both cases, the primary motivation for the objections brought against Zadroznys argument is the view that his encoding of the original meaning function does not properly reflect the synonymy relations posited for the language. In this paper, we argue that these technical criticisms do not go through. In particular, we prove that mu properly encodes synonymy relations, i.e. if two expressions are synonymous, then their compositional meanings are identical. This corrects some misconceptions about the function mu, e.g. Janssen (1997). We suggest that the reason that semanticists have been anxious to preserve compositionality as a significant constraint on semantic theory is that it has been mistakenly regarded as a condition that must be satisfied by any theory that sustains a systematic connection between the meaning of an expression and the meanings of its parts. Recent developments in formal and computational semantics show that systematic theories of meanings need not be compositional.

الحساب واللغة المنطق في علوم الحاسوب

CMB Constraints on Principal Components of the Inflaton Potential

299 - Cora Dvorkin , Wayne Hu 2010

We place functional constraints on the shape of the inflaton potential from the cosmic microwave background through a variant of the generalized slow roll approximation that allows large amplitude, rapidly changing deviations from scale-free conditio ns. Employing a principal component decomposition of the source function G~3(V/V)^2 - 2V/V and keeping only those measured to better than 10% results in 5 nearly independent Gaussian constraints that maybe used to test any single-field inflationary model where such deviations are expected. The first component implies < 3% variations at the 100 Mpc scale. One component shows a 95% CL preference for deviations around the 300 Mpc scale at the ~10% level but the global significance is reduced considering the 5 components examined. This deviation also requires a change in the cold dark matter density which in a flat LCDM model is disfavored by current supernova and Hubble constant data and can be tested with future polarization or high multipole temperature data. Its impact resembles a local running of the tilt from multipoles 30-800 but is only marginally consistent with a constant running beyond this range. For this analysis, we have implemented a ~40x faster WMAP7 likelihood method which we have made publicly available.

علم الكونيات والفيزياء الفلكية Nongalactic فيزياء الطاقة العالية - الظواهر