ترغب بنشر مسار تعليمي؟ اضغط هنا

DialectGram: Detecting Dialectal Variation at Multiple Geographic Resolutions

59   0   0.0 ( 0 )
 نشر من قبل Hang Jiang
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Several computational models have been developed to detect and analyze dialect variation in recent years. Most of these models assume a predefined set of geographical regions over which they detect and analyze dialectal variation. However, dialect variation occurs at multiple levels of geographic resolution ranging from cities within a state, states within a country, and between countries across continents. In this work, we propose a model that enables detection of dialectal variation at multiple levels of geographic resolution obviating the need for a-priori definition of the resolution level. Our method DialectGram, learns dialect-sensitive word embeddings while being agnostic of the geographic resolution. Specifically it only requires one-time training and enables analysis of dialectal variation at a chosen resolution post-hoc -- a significant departure from prior models which need to be re-trained whenever the pre-defined set of regions changes. Furthermore, DialectGram explicitly models senses thus enabling one to estimate the proportion of each sense usage in any given region. Finally, we quantitatively evaluate our model against other baselines on a new evaluation dataset DialectSim (in English) and show that DialectGram can effectively model linguistic variation.



قيم البحث

اقرأ أيضاً

Providing timely accessibility reminders of a point-of-interest (POI) plays a vital role in improving user satisfaction of finding places and making visiting decisions. However, it is difficult to keep the POI database in sync with the real-world cou nterparts due to the dynamic nature of business changes. To alleviate this problem, we formulate and present a practical solution that jointly extracts POI mentions and identifies their coupled accessibility labels from unstructured text. We approach this task as a sequence tagging problem, where the goal is to produce <POI name, accessibility label> pairs from unstructured text. This task is challenging because of two main issues: (1) POI names are often newly-coined words so as to successfully register new entities or brands and (2) there may exist multiple pairs in the text, which necessitates dealing with one-to-many or many-to-one mapping to make each POI coupled with its accessibility label. To this end, we propose a Geographic-Enhanced and Dependency-guIded sequence Tagging (GEDIT) model to concurrently address the two challenges. First, to alleviate challenge #1, we develop a geographic-enhanced pre-trained model to learn the text representations. Second, to mitigate challenge #2, we apply a relational graph convolutional network to learn the tree node representations from the parsed dependency tree. Finally, we construct a neural sequence tagging model by integrating and feeding the previously pre-learned representations into a CRF layer. Extensive experiments conducted on a real-world dataset demonstrate the superiority and effectiveness of GEDIT. In addition, it has already been deployed in production at Baidu Maps. Statistics show that the proposed solution can save significant human effort and labor costs to deal with the same amount of documents, which confirms that it is a practical way for POI accessibility maintenance.
206 - Nilo Pedrazzini 2020
This paper explores the possibility of improving the performance of specialized parsers for pre-modern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treate d as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers -- one for East Slavic and one for South Slavic -- are trained using jPTDP (Nguyen & Verspoor 2018), a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a number of Universal Dependency (UD) treebanks, including Old Church Slavonic (OCS). With these experiments, a new state of the art is obtained for both OCS (83.79% unlabelled attachment score (UAS) and 78.43% labelled attachement score (LAS)) and Old East Slavic (OES) (85.7% UAS and 80.16% LAS).
In the area of geographic information processing. There are few researches on geographic text classification. However, the application of this task in Chinese is relatively rare. In our work, we intend to implement a method to extract text containing geographical entities from a large number of network text. The geographic information in these texts is of great practical significance to transportation, urban and rural planning, disaster relief and other fields. We use the method of graph convolutional neural network with attention mechanism to achieve this function. Graph attention networks is an improvement of graph convolutional neural networks. Compared with GCN, the advantage of GAT is that the attention mechanism is proposed to weight the sum of the characteristics of adjacent nodes. In addition, We construct a Chinese dataset containing geographical classification from multiple datasets of Chinese text classification. The Macro-F Score of the geoGAT we used reached 95% on the new Chinese dataset.
Convective and radiative cooling are the two principle mechanisms by which the Earths surface transfers heat into the atmosphere and that shape surface temperature. However, this partitioning is not sufficiently constrained by energy and mass balance s alone. We use a simple energy balance model in which convective fluxes and surface temperatures are determined with the additional thermodynamic limit of maximum convective power. We then show that the broad geographic variation of heat fluxes and surface temperatures in the climatological mean compare very well with the ERA-Interim reanalysis over land and ocean. We also show that the estimates depend considerably on the formulation of longwave radiative transfer and that a spatially uniform offset is related to the assumed cold temperature sink at which the heat engine operates.
The widely used adaptive HTTP streaming requires an efficient algorithm to encode the same video to different resolutions. In this paper, we propose a fast block structure determination algorithm based on the AV1 codec that accelerates high resolutio n encoding, which is the bottle-neck of multiple resolutions encoding. The block structure similarity across resolutions is modeled by the fineness of frame detail and scale of object motions, this enables us to accelerate high resolution encoding based on low resolution encoding results. The average depth of a blocks co-located neighborhood is used to decide early termination in the RDO process. Encoding results show that our proposed algorithm reduces encoding time by 30.1%-36.8%, while keeping BD-rate low at 0.71%-1.04%. Comparing to the state-of-the-art, our method halves performance loss without sacrificing time savings.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا