Do you want to publish a course? Click here

This study evaluates whether model-based Collaborative Filtering (CF) algorithms, which have been extensively studied and widely used to build recommender systems, can be used to predict which common nouns a predicate can take as its complement. We f ind that, when trained on verb-noun co-occurrence data drawn from the Corpus of Contemporary American-English (COCA), two popular model-based CF algorithms, Singular Value Decomposition and Non-negative Matrix Factorization, perform well on this task, each achieving an AUROC of at least 0.89 and surpassing several different baselines. We then show that the embedding-vectors for verbs and nouns learned by the two CF models can be quantized (via application of k-means clustering) with minimal loss of performance on the prediction task while only using a small number of verb and noun clusters (relative to the number of distinct verbs and nouns). Finally we evaluate the alignment between the quantized embedding vectors for verbs and the Levin verb classes, finding that the alignment surpassed several randomized baselines. We conclude by discussing how model-based CF algorithms might be applied to learning restrictions on constituent selection between various lexical categories and how these (learned) models could then be used to augment a (rule-based) constituency grammar.
Performance of NMT systems has been proven to depend on the quality of the training data. In this paper we explore different open-source tools that can be used to score the quality of translation pairs, with the goal of obtaining clean corpora for tr aining NMT models. We measure the performance of these tools by correlating their scores with human scores, as well as rank models trained on the resulting filtered datasets in terms of their performance on different test sets and MT performance metrics.
Data filtering for machine translation (MT) describes the task of selecting a subset of a given, possibly noisy corpus with the aim to maximize the performance of an MT system trained on this selected data. Over the years, many different filtering ap proaches have been proposed. However, varying task definitions and data conditions make it difficult to draw a meaningful comparison. In the present work, we aim for a more systematic approach to the task at hand. First, we analyze the performance of language identification, a tool commonly used for data filtering in the MT community and identify specific weaknesses. Based on our findings, we then propose several novel methods for data filtering, based on cross-lingual word embeddings. We compare our approaches to one of the winning methods from the WMT 2018 shared task on parallel corpus filtering on three real-life, high resource MT tasks. We find that said method, which was performing very strong in the WMT shared task, does not perform well within our more realistic task conditions. While we find that our approaches come out at the top on all three tasks, different variants perform best on different tasks. Further experiments on the WMT 2020 shared task for parallel corpus filtering show that our methods achieve comparable results to the strongest submissions of this campaign.
In most of neural machine translation distillation or stealing scenarios, the highest-scoring hypothesis of the target model (teacher) is used to train a new model (student). If reference translations are also available, then better hypotheses (with respect to the references) can be oversampled and poor hypotheses either removed or undersampled. This paper explores the sampling method landscape (pruning, hypothesis oversampling and undersampling, deduplication and their combination) with English to Czech and English to German MT models using standard MT evaluation metrics. We show that careful oversampling and combination with the original data leads to better performance when compared to training only on the original or synthesized data or their direct combination.
We present a system for zero-shot cross-lingual offensive language and hate speech classification. The system was trained on English datasets and tested on a task of detecting hate speech and offensive social media content in a number of languages wi thout any additional training. Experiments show an impressive ability of both models to generalize from English to other languages. There is however an expected gap in performance between the tested cross-lingual models and the monolingual models. The best performing model (offensive content classifier) is available online as a REST API.
Since Electroencephalogram (EEG) signals have very small magnitude, it's very hard to capture these signals without having noise (produced by surrounding artifacts) affect the real EEG signals, so it is necessary to use Filters to remove noise. Th is work proposes a design of an electronic circuit using a microcontroller, an instrumentation amplifier and an operational amplifier able to capture EEG signals, convert the captured signals from analog state to digital one and send the converted signal (digital signal) to a group of three digital filters. This paper gives a design of three digital elliptic filters ready to be used in real time filtering of EEG signals (which preliminary represents the condition of the brain) making the software part which complements the hardware part in the EEG signals capturing system. Finally we are going to show the way of using the designed electronic circuit with the three designed digital filters, demonstrate and discuss the results of this work. We have used Eagle 6.6 software to design and draw the circuit, CodeVision AVR 3.12 software to write the program downloaded on the microcontroller, Mathworks MATLAB 2014a software to design the three digital filters and Mathworks MATLAB 2014a Simulink tool to make the appropriate experiments and get the results.
Recommender systems represents a class of systems designed to help individuals deal with information overload or incomplete information. Such systems help individuals by providing recommendation through the use of various personalization techniques . Collaborative filtering is a widely used technique for rating prediction in recommender systems. This paper presents a method uses preference relations instead of absolute ratings for similarity calculation. The result indicates that the proposed method outperform the other methods such as the Somers Coefficient.
The study suggests a new approach to segment the ultrasound uterus images to obtain the fetus region. The approach consists of three stages. The first includes the preprocessing in which the speckle noise is removed from the ultrasound images depe nding on sequential filtering of Gabor filter and median filter. Second, an improved active shape contour independent of edges is applied to segment the uterus images. The last stage is the post processing which depends on the morphological operation to eliminate the undesired region and obtain the region of interest (fetus). The designed system has been tested by means of medical database of ultrasound uterus images downloaded from the ULTRASCAN CENTRE site in Kaloor (India). The experimental tests show that the proposed sequential filtering technique improves the active shape contour algorithm performance significantly, so the system segment the uterus images correctly even in the presence of speckle noise.
We offer in this approach the integration of search engines with filtering techniques, through the dynamic relationship of hybridization between collaborative filtering and content based filtering in order to solve the past limitations and improve precision and recall of retrieved documents. The approach uses Domain ontology model in the representation of user profile to reduce errors and confusion resulting from consideration for user profile as a single entity, as well as taking advantage from user activity for adaptation of user profile to reflect the state of user.
This research aimed to study the lengths effect of calculated filter- operator by inverse filtering on the seismic data filtering. All programs used for estimating the seismic signal, calculate the Filter- operator and performing the Convolution w ere written by author. Some experiences about the influence of the length of filter-operator of the outcomes of the inverse filtering were performed. The seismic data used in this research were measured in two different areas in Syria, Alsaegh dam nearby Sueda City and Rajo dam nearby Aleppo City. In addition, experiences are carried out to test the effect of shortening the length of filter-operator. The results showed that small length of filter-operators can be used by inverse filtering without negative effect on the resolution of seismogram. In addition we noticed that shortening the length of filter-operator depend on the length of calculated filter-operator.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا