Do you want to publish a course? Click here

Simple and Efficient ways to Improve REALM

95   0   0.0 ( 0 )
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

Dense retrieval has been shown to be effective for retrieving relevant documents for Open Domain QA, surpassing popular sparse retrieval methods like BM25. REALM (Guu et al., 2020) is an end-to-end dense retrieval system that relies on MLM based pretraining for improved downstream QA efficiency across multiple datasets. We study the finetuning of REALM on various QA tasks and explore the limits of various hyperparameter and supervision choices. We find that REALM was significantly undertrained when finetuning and simple improvements in the training, supervision, and inference setups can significantly benefit QA results and exceed the performance of other models published post it. Our best model, REALM++, incorporates all the best working findings and achieves significant QA accuracy improvements over baselines (~5.5% absolute accuracy) without any model design changes. Additionally, REALM++ matches the performance of large Open Domain QA models which have 3x more parameters demonstrating the efficiency of the setup.



rate research

Read More

A sequence-to-sequence learning with neural networks has empirically proven to be an effective framework for Chinese Spelling Correction (CSC), which takes a sentence with some spelling errors as input and outputs the corrected one. However, CSC models may fail to correct spelling errors covered by the confusion sets, and also will encounter unseen ones. We propose a method, which continually identifies the weak spots of a model to generate more valuable training instances, and apply a task-specific pre-training strategy to enhance the model. The generated adversarial examples are gradually added to the training set. Experimental results show that such an adversarial training method combined with the pretraining strategy can improve both the generalization and robustness of multiple CSC models across three different datasets, achieving stateof-the-art performance for CSC task.
The focus of our paper is the identification and correction of non-word errors in OCR text. Such errors may be the result of incorrect insertion, deletion, or substitution of a character, or the transposition of two adjacent characters within a single word. Or, it can be the result of word boundary problems that lead to run-on errors and incorrect-split errors. The traditional N-gram correction methods can handle single-word errors effectively. However, they show limitations when dealing with split and merge errors. In this paper, we develop an unsupervised method that can handle both errors. The method we develop leads to a sizable improvement in the correction rates. This tutorial paper addresses very difficult word correction problems - namely incorrect run-on and split errors - and illustrates what needs to be considered when addressing such problems. We outline a possible approach and assess its success on a limited study.
345 - Zhen Wu , Lijun Wu , Qi Meng 2021
Transformer architecture achieves great success in abundant natural language processing tasks. The over-parameterization of the Transformer model has motivated plenty of works to alleviate its overfitting for superior performances. With some explorations, we find simple techniques such as dropout, can greatly boost model performance with a careful design. Therefore, in this paper, we integrate different dropout techniques into the training of Transformer models. Specifically, we propose an approach named UniDrop to unites three different dropout techniques from fine-grain to coarse-grain, i.e., feature dropout, structure dropout, and data dropout. Theoretically, we demonstrate that these three dropouts play different roles from regularization perspectives. Empirically, we conduct experiments on both neural machine translation and text classification benchmark datasets. Extensive results indicate that Transformer with UniDrop can achieve around 1.5 BLEU improvement on IWSLT14 translation tasks, and better accuracy for the classification even using strong pre-trained RoBERTa as backbone.
In the research there is reviewed the peculiarities of the formation of tax revenues of the state budget, analysis of the recent past and present periods of tax system in Georgia, there is reviewed the influence of existing factors on the revenues, as well as the role and the place of direct and indirect taxes in the state budget revenues. In addition, the measures of stimulating action on formation of tax revenues and their impact on the state budget revenues are established. At the final stage, there are examples of foreign developed countries, where the tax system is perfectly developed, where various stimulating measures are successfully stimulating and consequently it promotes mobilization of the amount of money required in the state budget. The exchange of foreign experience is very important for Georgia, the existing tax model that is based on foreign experience is greatly successful. For the formation of tax policy, it is necessary to take into consideration all the factors affecting on it, a complex analysis of the tax system and the steps that will be really useful and perspective for our country.
We present a simple, efficient and robust approach to improve cosmological redshift measurements. The method is based on the presence of a reference sample for which a precise redshift number distribution (dN/dz) can be obtained for different pencil-beam-like sub-volumes within the original survey. For each sub-volume we then impose: (i) that the redshift number distribution of the uncertain redshift measurements matches the reference dN/dz corrected by their selection functions; and (ii) the rank order in redshift of the original ensemble of uncertain measurements is preserved. The latter step is motivated by the fact that random variables drawn from Gaussian probability density functions (PDFs) of different means and arbitrarily large standard deviations satisfy stochastic ordering. We then repeat this simple algorithm for multiple arbitrary pencil-beam-like overlapping sub-volumes; in this manner, each uncertain measurement has multiple (non-independent) recovered redshifts which can be used to estimate a new redshift PDF. We refer to this method as the Stochastic Order Redshift Technique (SORT). We have used a state-of-the-art N-body simulation to test the performance of SORT under simple assumptions and found that it can improve the quality of cosmological redshifts in an efficient and robust manner. Particularly, SORT redshifts are able to recover the distinctive features of the cosmic web and can provide unbiased measurement of the two-point correlation function on scales > 4 Mpc/h. Given its simplicity, we envision that a method like SORT can be incorporated into more sophisticated algorithms aimed to exploit the full potential of large extragalactic photometric surveys.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا