ترغب بنشر مسار تعليمي؟ اضغط هنا

To Be or Not To Be a Verbal Multiword Expression: A Quest for Discriminating Features

67   0   0.0 ( 0 )
 نشر من قبل Carlos Ramisch
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف Caroline Pasquer




اسأل ChatGPT حول البحث

Automatic identification of mutiword expressions (MWEs) is a pre-requisite for semantically-oriented downstream applications. This task is challenging because MWEs, especially verbal ones (VMWEs), exhibit surface variability. However, this variability is usually more restricted than in regular (non-VMWE) constructions, which leads to various variability profiles. We use this fact to determine the optimal set of features which could be used in a supervised classification setting to solve a subproblem of VMWE identification: the identification of occurrences of previously seen VMWEs. Surprisingly, a simple custom frequency-based feature selection method proves more efficient than other standard methods such as Chi-squared test, information gain or decision trees. An SVM classifier using the optimal set of only 6 features outperforms the best systems from a recent shared task on the French seen data.



قيم البحث

اقرأ أيضاً

We review some aspects, especially those we can tackle analytically, of a minimal model of closed economy analogous to the kinetic theory model of ideal gases where the agents exchange wealth amongst themselves such that the total wealth is conserved , and each individual agent saves a fraction (0 < lambda < 1) of wealth before transaction. We are interested in the special case where the fraction lambda is constant for all the agents (global saving propensity) in the closed system. We show by moment calculations that the resulting wealth distribution cannot be the Gamma distribution that was conjectured in Phys. Rev. E 70, 016104 (2004). We also derive a form for the distribution at low wealth, which is a new result.
In cases where both components of a binary system show oscillations, asteroseismology has been proposed as a method to identify the system. For KIC 2568888, observed with $Kepler$, we detect oscillation modes for two red giants in a single power dens ity spectrum. Through an asteroseismic study we investigate if the stars have similar properties, which could be an indication that they are physically bound into a binary system. While one star lies on the red giant branch (RGB), the other, more evolved star, is either a RGB or asymptotic-giant-branch star. We found similar ages for the red giants and a mass ratio close to 1. Based on these asteroseismic results we propose KIC 2568888 as a rare candidate binary system ($sim 0.1%$ chance). However, when combining the asteroseismic data with ground-based $BVI$ photometry we estimated different distances for the stars, which we cross-checked with $Gaia$ DR2. From $Gaia$ we obtained for one object a distance between and broadly consistent with the distances from $BVI$ photometry. For the other object we have a negative parallax with a not yet reliable $Gaia$ distance solution. The derived distances challenge a binary interpretation and may either point to a triple system, which could explain the visible magnitudes, or, to a rare chance alignment ($sim 0.05%$ chance based on stellar magnitudes). This probability would even be smaller, if calculated for close pairs of stars with a mass ratio close to unity in addition to similar magnitudes, which may indeed indicate that a binary scenario is more favourable.
93 - M.Pohlen 2005
We have in recent years come to view the outer parts of galaxies as having vital clues about their formation and evolution. Here, we would like to briefly present our results from a complete sample of nearby, late-type, spiral galaxies, using data fr om the SDSS survey, especially focused on the stellar light distribution in the outer disk. Our study shows that only the minority of late-type galaxies show a classical, exponential Freeman Type I profile down to the noise limit, whereas the majority exhibit either downbending (stellar truncation as introduced 1979 by Piet van der Kruit) or upbending profiles.
The cosmological missing baryons at z<1 most likely hide in the hot (T$gtrsim10^{5.5}$ K) phase of the Warm Hot Intergalactic Medium (WHIM). While the hot WHIM is hard to detect due to its high ionisation level, the warm (T$lesssim10^{5.5}$ K) phase of the WHIM has been very robustly detected in the FUV band. We adopted the assumption that the hot and warm WHIM phases are co-located and thus used the FUV-detected warm WHIM as a tracer for the cosmologically interesting hot WHIM. We utilised the assumption by performing an X-ray follow-up in the sight line of a blazar PKS 2155-304 at the redshifts where previous FUV measurements of OVI, SiIV and BLA absorption have indicated the existence of the warm WHIM. We looked for the OVII He$alpha$ and OVIII Ly$alpha$ absorption lines, the most likely hot WHIM tracers. Despite of the very large exposure time ($approx$ 1 Ms), the XMM-Newton/RGS1 data yielded no significant detection which corresponds to upper limits of $log{N({rm OVII})({rm cm}^{-2}))} le 14.5-15.2$ and $log{N({rm OVIII})({rm cm}^{-2}))} le 14.9-15.2$. An analysis of LETG/HRC data yielded consistent results. However, the LETG/ACIS data yielded a detection of an absorption line - like feature at $lambda approx$ 20 AA at simple one parameter uncertainty - based confidence level of 3.7 $sigma$, consistently with several earlier LETG/ACIS reports. Given the high statistical quality of the RGS1 data, the possibility of RGS1 accidentally missing the true line at $lambda sim$ 20 AA is very low, 0.006%. Neglecting this, the LETG/ACIS detection can be interpreted as Ly$alpha$ transition of OVIII at one of the redshifts (z$approx$ 0.054) of FUV-detected warm WHIM. Given the very convincing X-ray spectral evidence for and against the existence of the $lambda sim$ 20 AA feature, we cannot conclude whether or not it is a true astrophysical absorption line.
Due to the discrete nature of words, language GANs require to be optimized from rewards provided by discriminator networks, via reinforcement learning methods. This is a much harder setting than for continuous tasks, which enjoy gradient flows from d iscriminators to generators, usually leading to dramatic learning instabilities. However, we claim that this can be solved by making discriminator and generator networks cooperate to produce output sequences during training. These cooperative outputs, inherently built to obtain higher discrimination scores, not only provide denser rewards for training, but also form a more compact artificial set for discriminator training, hence improving its accuracy and stability. In this paper, we show that our SelfGAN framework, built on this cooperative principle, outperforms Teacher Forcing and obtains state-of-the-art results on two challenging tasks, Summarization and Question Generation.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا