ترغب بنشر مسار تعليمي؟ اضغط هنا

On the universal structure of human lexical semantics

454   0   0.0 ( 0 )
 نشر من قبل HyeJin Youn
 تاريخ النشر 2015
والبحث باللغة English




اسأل ChatGPT حول البحث

How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides direct access to the underlying conceptual structure, but meaning is notoriously difficult to measure, let alone parameterize. Here we provide an empirical measure of semantic proximity between concepts using cross-linguistic dictionaries. Across languages carefully selected from a phylogenetically and geographically stratified sample of genera, translations of words reveal cases where a particular language uses a single polysemous word to express concepts represented by distinct words in another. We use the frequency of polysemies linking two concepts as a measure of their semantic proximity, and represent the pattern of such linkages by a weighted network. This network is highly uneven and fragmented: certain concepts are far more prone to polysemy than others, and there emerge naturally interpretable clusters loosely connected to each other. Statistical analysis shows such structural properties are consistent across different language groups, largely independent of geography, environment, and literacy. It is therefore possible to conclude the conceptual structure connecting basic vocabulary studied is primarily due to universal features of human cognition and language use.



قيم البحث

اقرأ أيضاً

Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (1) the words of natural human language possess a uni versal positivity bias; (2) the estimated emotional content of words is consistent between languages under translation; and (3) this positivity bias is strongly independent of frequency of word usage. Alongside these general regularities, we describe inter-language variations in the emotional spectrum of languages which allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.
Human movements in the real world and in cyberspace affect not only dynamical processes such as epidemic spreading and information diffusion but also social and economical activities such as urban planning and personalized recommendation in online sh opping. Despite recent efforts in characterizing and modeling human behaviors in both the real and cyber worlds, the fundamental dynamics underlying human mobility have not been well understood. We develop a minimal, memory-based random walk model in limited space for reproducing, with a single parameter, the key statistical behaviors characterizing human movements in both spaces. The model is validated using big data from mobile phone and online commerce, suggesting memory-based random walk dynamics as the universal underpinning for human mobility, regardless of whether it occurs in the real world or in cyberspace.
Modelling the process that a listener actuates in deriving the words intended by a speaker requires setting a hypothesis on how lexical items are stored in memory. This work aims at developing a system that imitates humans when identifying words in r unning speech and, in this way, provide a framework to better understand human speech processing. We build a speech recognizer for Italian based on the principles of Stevens model of Lexical Access in which words are stored as hierarchical arrangements of distinctive features (Stevens, K. N. (2002). Toward a model for lexical access based on acoustic landmarks and distinctive features, J. Acoust. Soc. Am., 111(4):1872-1891). Over the past few decades, the Speech Communication Group at the Massachusetts Institute of Technology (MIT) developed a speech recognition system for English based on this approach. Italian will be the first language beyond English to be explored; the extension to another language provides the opportunity to test the hypothesis that words are represented in memory as a set of hierarchically-arranged distinctive features, and reveal which of the underlying mechanisms may have a language-independent nature. This paper also introduces a new Lexical Access corpus, the LaMIT database, created and labeled specifically for this work, that will be provided freely to the speech research community. Future developments will test the hypothesis that specific acoustic discontinuities - called landmarks - that serve as cues to features, are language independent, while other cues may be language-dependent, with powerful implications for understanding how the human brain recognizes speech.
142 - Yile Wang , Leyang Cui , Yue Zhang 2019
Contextualized embeddings such as BERT can serve as strong input representations to NLP tasks, outperforming their static embeddings counterparts such as skip-gram, CBOW and GloVe. However, such embeddings are dynamic, calculated according to a sente nce-level context, which limits their use in lexical semantics tasks. We address this issue by making use of dynamic embeddings as word representations in training static embeddings, thereby leveraging their strong representation power for disambiguating context information. Results show that this method leads to improvements over traditional static embeddings on a range of lexical semantics tasks, obtaining the best reported results on seven datasets.
Social networks have been of much interest in recent years. We here focus on a network structure derived from co-occurrences of people in traditional newspaper media. We find three clear deviations from what can be expected in a random graph. First, the average degree in the empirical network is much lower than expected, and the average weight of a link much higher than expected. Secondly, high degree nodes attract disproportionately much weight. Thirdly, relatively much of the weight seems to concentrate between high degree nodes. We believe this can be explained by the fact that most people tend to co-occur repeatedly with the same people. We create a model that replicates these observations qualitatively based on two self-reinforcing processes: (1) more frequently occurring persons are more likely to occur again; and (2) if two people co-occur frequently, they are more likely to co-occur again. This suggest that the media tends to focus on people that are already in the news, and that they reinforce existing co-occurrences.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا