Research papers, master and doctoral theses about filtering

Using Collaborative Filtering to Model Argument Selection

162 - Association for Computation Linguistics 2021 مقالة

This study evaluates whether model-based Collaborative Filtering (CF) algorithms, which have been extensively studied and widely used to build recommender systems, can be used to predict which common nouns a predicate can take as its complement. We f ind that, when trained on verb-noun co-occurrence data drawn from the Corpus of Contemporary American-English (COCA), two popular model-based CF algorithms, Singular Value Decomposition and Non-negative Matrix Factorization, perform well on this task, each achieving an AUROC of at least 0.89 and surpassing several different baselines. We then show that the embedding-vectors for verbs and nouns learned by the two CF models can be quantized (via application of k-means clustering) with minimal loss of performance on the prediction task while only using a small number of verb and noun clusters (relative to the number of distinct verbs and nouns). Finally we evaluate the alignment between the quantized embedding vectors for verbs and the Levin verb classes, finding that the alignment surpassed several randomized baselines. We conclude by discussing how model-based CF algorithms might be applied to learning restrictions on constituent selection between various lexical categories and how these (learned) models could then be used to augment a (rule-based) constituency grammar.

model-based collaborative filtering model argument selection الترشيح التعاونية القائمة على النموذج تصفية التعاونية اختيار الحجة النموذجية صناعة حمض الفوسفور

Selecting the best data filtering method for NMT training

239 - Association for Computation Linguistics 2021 مقالة

Performance of NMT systems has been proven to depend on the quality of the training data. In this paper we explore different open-source tools that can be used to score the quality of translation pairs, with the goal of obtaining clean corpora for tr aining NMT models. We measure the performance of these tools by correlating their scores with human scores, as well as rank models trained on the resulting filtered datasets in terms of their performance on different test sets and MT performance metrics.

data filtering method filtering method training nmt models طريقة تصفية البيانات طريقة تصفية تدريب نماذج NMT. صناعة حمض الفوسفور المزيد..

Data Filtering using Cross-Lingual Word Embeddings

298 - Association for Computation Linguistics 2021 مقالة

Data filtering for machine translation (MT) describes the task of selecting a subset of a given, possibly noisy corpus with the aim to maximize the performance of an MT system trained on this selected data. Over the years, many different filtering ap proaches have been proposed. However, varying task definitions and data conditions make it difficult to draw a meaningful comparison. In the present work, we aim for a more systematic approach to the task at hand. First, we analyze the performance of language identification, a tool commonly used for data filtering in the MT community and identify specific weaknesses. Based on our findings, we then propose several novel methods for data filtering, based on cross-lingual word embeddings. We compare our approaches to one of the winning methods from the WMT 2018 shared task on parallel corpus filtering on three real-life, high resource MT tasks. We find that said method, which was performing very strong in the WMT shared task, does not perform well within our more realistic task conditions. While we find that our approaches come out at the top on all three tasks, different variants perform best on different tasks. Further experiments on the WMT 2020 shared task for parallel corpus filtering show that our methods achieve comparable results to the strongest submissions of this campaign.

data filtering cross-lingual word embeddings filtering تصفية البيانات تضمين كلمة Lingual الفلتره صناعة حمض الفوسفور المزيد..

Sampling and Filtering of Neural Machine Translation Distillation Data

210 - Association for Computation Linguistics 2021 مقالة

In most of neural machine translation distillation or stealing scenarios, the highest-scoring hypothesis of the target model (teacher) is used to train a new model (student). If reference translations are also available, then better hypotheses (with respect to the references) can be oversampled and poor hypotheses either removed or undersampled. This paper explores the sampling method landscape (pruning, hypothesis oversampling and undersampling, deduplication and their combination) with English to Czech and English to German MT models using standard MT evaluation metrics. We show that careful oversampling and combination with the original data leads to better performance when compared to training only on the original or synthesized data or their direct combination.

growdsourcing اللغة الطبيعية machine translation distillation filtering of neural جهاز التقطير الترجمة تصفية العصبية صناعة حمض الفوسفور

Zero-shot Cross-lingual Content Filtering: Offensive Language and Hate Speech Detection

267 - Association for Computation Linguistics 2021 مقالة

We present a system for zero-shot cross-lingual offensive language and hate speech classification. The system was trained on English datasets and tested on a task of detecting hate speech and offensive social media content in a number of languages wi thout any additional training. Experiments show an impressive ability of both models to generalize from English to other languages. There is however an expected gap in performance between the tested cross-lingual models and the monolingual models. The best performing model (offensive content classifier) is available online as a REST API.

استخلاص الكلمات الرئيسية العصبية cross-lingual content filtering تصفية المحتوى عبر اللغات صناعة حمض الفوسفور

Data Acquisition System Design Using a Microcontroller and Digital Elliptic Filters that are able to Remove Noise from EEG signals

2056 - Tishreen University 2015 ورقة بحثية

Since Electroencephalogram (EEG) signals have very small magnitude, it's very hard to capture these signals without having noise (produced by surrounding artifacts) affect the real EEG signals, so it is necessary to use Filters to remove noise. Th is work proposes a design of an electronic circuit using a microcontroller, an instrumentation amplifier and an operational amplifier able to capture EEG signals, convert the captured signals from analog state to digital one and send the converted signal (digital signal) to a group of three digital filters. This paper gives a design of three digital elliptic filters ready to be used in real time filtering of EEG signals (which preliminary represents the condition of the brain) making the software part which complements the hardware part in the EEG signals capturing system. Finally we are going to show the way of using the designed electronic circuit with the three designed digital filters, demonstrate and discuss the results of this work. We have used Eagle 6.6 software to design and draw the circuit, CodeVision AVR 3.12 software to write the program downloaded on the microcontroller, Mathworks MATLAB 2014a software to design the three digital filters and Mathworks MATLAB 2014a Simulink tool to make the appropriate experiments and get the results.

Operational amplifier Noise متحكم صغري إشارات التخطيط الكهربائي للدماغ مضخم تجهيزي مضخم عملياتي إشارة تشابهية إشارة رقمية مرشح رقمي زمن حقيقي ضجيج EEG Microcontroller Instrumentation Amplifier Analog Signal Digital Signal Digital Filter Real Time Filtering المزيد..

1092 - Tishreen University 2014 ورقة بحثية

Recommender systems represents a class of systems designed to help individuals deal with information overload or incomplete information. Such systems help individuals by providing recommendation through the use of various personalization techniques . Collaborative filtering is a widely used technique for rating prediction in recommender systems. This paper presents a method uses preference relations instead of absolute ratings for similarity calculation. The result indicates that the proposed method outperform the other methods such as the Somers Coefficient.

Recommender System Collaborative filtering Similarity calculation الانظمة الناصحة الفلترة التعاونية حساب التشابه

Study of the influence of removing speckle noise on the segmentation of uterus ultrasonic images based on Active shape contour algorithm

1670 - Aِl-Baath University 2014 ورقة بحثية

The study suggests a new approach to segment the ultrasound uterus images to obtain the fetus region. The approach consists of three stages. The first includes the preprocessing in which the speckle noise is removed from the ultrasound images depe nding on sequential filtering of Gabor filter and median filter. Second, an improved active shape contour independent of edges is applied to segment the uterus images. The last stage is the post processing which depends on the morphological operation to eliminate the undesired region and obtain the region of interest (fetus). The designed system has been tested by means of medical database of ultrasound uterus images downloaded from the ULTRASCAN CENTRE site in Kaloor (India). The experimental tests show that the proposed sequential filtering technique improves the active shape contour algorithm performance significantly, so the system segment the uterus images correctly even in the presence of speckle noise.

facial characteristic points-FCP neuro_fuzzy controller Noise Filtering Gabor Filter Median Filter Active Shape Contour Algorithm Morphological Operations ترشيح الضجيج مرشح جابور المرشح الوسيط خوارزمية مخططالشكل الفعال عمليات الهندسة الصورية المزيد..

Approach for the integration of filtering techniques with the search engines

1426 - Damascus University 2011 ورقة بحثية

We offer in this approach the integration of search engines with filtering techniques, through the dynamic relationship of hybridization between collaborative filtering and content based filtering in order to solve the past limitations and improve precision and recall of retrieved documents. The approach uses Domain ontology model in the representation of user profile to reduce errors and confusion resulting from consideration for user profile as a single entity, as well as taking advantage from user activity for adaptation of user profile to reflect the state of user.

Information retrieval ترشيح المعلومات لاحة المستخدم استرجاع المعلومات الترشيح المرتكز على المحتوى الترشيح التعاوني Information filtering User profile Content based filtering Collaborative filtering المزيد..

A study about the filter-operator by the inverse filtering

693 - Damascus University 2006 ورقة بحثية

This research aimed to study the lengths effect of calculated filter- operator by inverse filtering on the seismic data filtering. All programs used for estimating the seismic signal, calculate the Filter- operator and performing the Convolution w ere written by author. Some experiences about the influence of the length of filter-operator of the outcomes of the inverse filtering were performed. The seismic data used in this research were measured in two different areas in Syria, Alsaegh dam nearby Sueda City and Rajo dam nearby Aleppo City. In addition, experiences are carried out to test the effect of shortening the length of filter-operator. The results showed that small length of filter-operators can be used by inverse filtering without negative effect on the resolution of seismogram. In addition we noticed that shortening the length of filter-operator depend on the length of calculated filter-operator.

التصفية العكسية عملية فك الثني معامل التصفية Inverse filtering Deconvolution Filter operator

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد