ترغب بنشر مسار تعليمي؟ اضغط هنا

Popular approaches to natural language processing create word embeddings based on textual co-occurrence patterns, but often ignore embodied, sensory aspects of language. Here, we introduce the Python package comp-syn, which provides grounded word emb eddings based on the perceptually uniform color distributions of Google Image search results. We demonstrate that comp-syn significantly enriches models of distributional semantics. In particular, we show that (1) comp-syn predicts human judgments of word concreteness with greater accuracy and in a more interpretable fashion than word2vec using low-dimensional word-color embeddings, and (2) comp-syn performs comparably to word2vec on a metaphorical vs. literal word-pair classification task. comp-syn is open-source on PyPi and is compatible with mainstream machine-learning Python packages. Our package release includes word-color embeddings for over 40,000 English words, each associated with crowd-sourced word concreteness judgments.
We propose a new supervised learning algorithm, for classification and regression problems where two or more preliminary predictors are available. We introduce texttt{KernelCobra}, a non-linear learning strategy for combining an arbitrary number of i nitial predictors. texttt{KernelCobra} builds on the COBRA algorithm introduced by citet{biau2016cobra}, which combined estimators based on a notion of proximity of predictions on the training data. While the COBRA algorithm used a binary threshold to declare which training data were close and to be used, we generalize this idea by using a kernel to better encapsulate the proximity information. Such a smoothing kernel provides more representative weights to each of the training points which are used to build the aggregate and final predictor, and texttt{KernelCobra} systematically outperforms the COBRA algorithm. While COBRA is intended for regression, texttt{KernelCobra} deals with classification and regression. texttt{KernelCobra} is included as part of the open source Python package texttt{Pycobra} (0.2.4 and onward), introduced by citet{guedj2018pycobra}. Numerical experiments assess the performance (in terms of pure prediction and computational complexity) of texttt{KernelCobra} on real-life and synthetic datasets.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا