In this paper, we present our systems submitted to SemEval-2021 Task 1 on lexical complexity prediction.The aim of this shared task was to create systems able to predict the lexical complexity of word tokens and bigram multiword expressions within a
given sentence context, a continuous value indicating the difficulty in understanding a respective utterance. Our approach relies on gradient boosted regression tree ensembles fitted using a heterogeneous feature set combining linguistic features, static and contextualized word embeddings, psycholinguistic norm lexica, WordNet, word- and character bigram frequencies and inclusion in wordlists to create a model able to assign a word or multiword expression a context-dependent complexity score. We can show that especially contextualised string embeddings can help with predicting lexical complexity.
This paper describes our contribution to SemEval 2021 Task 1 (Shardlow et al., 2021): Lexical Complexity Prediction. In our approach, we leverage the ELECTRA model and attempt to mirror the data annotation scheme. Although the task is a regression ta
sk, we show that we can treat it as an aggregation of several classification and regression models. This somewhat counter-intuitive approach achieved an MAE score of 0.0654 for Sub-Task 1 and MAE of 0.0811 on Sub-Task 2. Additionally, we used the concept of weak supervision signals from Gloss-BERT in our work, and it significantly improved the MAE score in Sub-Task 1.
This paper describes our submission to theSemEval'21: Task 7- HaHackathon: Detecting and Rating Humor and Offense. In this challenge, we explore intermediate finetuning, backtranslation augmentation, multitask learning, and ensembling of different la
nguage models. Curiously, intermediate finetuning and backtranslation do not improve performance, while multitask learning and ensembling do improve performance. We explore why intermediate finetuning and backtranslation do not provide the same benefit as other natural language processing tasks and offer insight into the errors that our model makes. Our best performing system ranks 7th on Task 1bwith an RMSE of 0.5339
Recently, a class of tracking techniques called "tracking by detection" has been shown to give promising results at real-time speeds. These methods train a discriminative classifier in an online manner to separate the object from the background. This
classifier bootstraps itself by using the current tracker state to extract positive and negative examples from the current frame. Slight inaccuracies in the tracker can therefore lead to incorrectly labeled training examples, which degrade the classifier and can cause drift. In this paper, we show that usingSimple Online and Realtime Tracking (SORT) which is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms
Linear regression methods impose strong constraints on regression models, especially on
the error terms where it assumes that it is independent and follows normal distribution, and
this may not be satisfied in many studies, leading to bias that can
not be ignored from the
actual model, which affects the credibility of the study.
We present in this paper the problem of estimating the regression function using the
Nadarya Watson kernel and k- nearest neighbor estimators as alternatives to the parametric
linear regression estimators through a simulation study on an imposed model, where we
conducted a comparative study between these methods using the statistical programming
language R in order to know the best of these estimations. Where the mean squares errors
(MSE) was used to determine the best estimate.
The results of the simulation study also indicate the effectiveness and efficiency of the
nonparametric in the representation of the regression function as compared to linear
regression estimators, and indicate the convergence of the performance of these two
estimates.
This research aims to predict the level of air pollution with a set of data used to make predictions through them and to obtain the best prediction using several models and compare them and find the appropriate solution.
These papers aim to study the estimation of the simple linear regression equation
coefficients using the least square method at different sample sizes and different sampling
methods. And so on, the main goal of this research is to try to determine
the optimum size
and the best sampling method for these coefficients. We used experimental data for a
population consist of 2000 students from different schools all over the country. We had
changed the sample size each time and calculate the coefficients and then compare these
coefficients for different sample sizes with their coefficients of the real population; and the
results have been shown that the estimation of the linear regression equation coefficients
are close from the real values of the coefficients of the regression line equation for the
population when the sample size closes the value (325). As it turns out that the Stratified
random sampling with proportional distribution with class sizes gives the best and most
accurate results to estimate linear regression equation with least square method.
In this research, we developed an algorithm to measure
the quality of processed digital video, with no
information about the video before processing, to know
how much the video was distorted as a result of
processing
The research aimed at studying the impact of the most
important economic and social factors affecting the adoption of new
irrigation techniques، namely water collective management in ALGhab
basin in Syria .The research accomplished by taking a
si
mple random sample of 264 farmers .Because of the nature of
dependent variable which is dichotomous ،(1= adoption of water
collective management،0=otherwise)،The binary logistic regression
was used.
In this research ,we studied the problem of multicollinearity among
independent variables in the multiple regression model this matter
leads to a mistake in one of the essential conditions of the multiple
regression model and getting incorrect res
ults.
At the beginning we have introduced documented theoretical study
of the kinds of the multicollinearity and of the reasons of the
problem of the multiple regression model and some methods to
discover them.
In addition to this we mentioned some methods that treat the cases of
multiple regression model then we introduced a new method to treat
multicollineartiy and apply it to an example .
In this method we have dealt with multicollinearity on the hand and
solved the problem of discrepancy between the significant of the
regression model and the non-significant of one or more coefficient.