Do you want to publish a course? Click here

Different Strokes for Different Folks: Investigating Appropriate Further Pre-training Approaches for Diverse Dialogue Tasks

السكتات الدماغية المختلفة للناس المختلفة: التحقيق في المزيد من الأساليب المسبقة التدريبية المناسبة لمهام الحوار المتنوعة

347   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

Loading models pre-trained on the large-scale corpus in the general domain and fine-tuning them on specific downstream tasks is gradually becoming a paradigm in Natural Language Processing. Previous investigations prove that introducing a further pre-training phase between pre-training and fine-tuning phases to adapt the model on the domain-specific unlabeled data can bring positive effects. However, most of these further pre-training works just keep running the conventional pre-training task, e.g., masked language model, which can be regarded as the domain adaptation to bridge the data distribution gap. After observing diverse downstream tasks, we suggest that different tasks may also need a further pre-training phase with appropriate training tasks to bridge the task formulation gap. To investigate this, we carry out a study for improving multiple task-oriented dialogue downstream tasks through designing various tasks at the further pre-training phase. The experiment shows that different downstream tasks prefer different further pre-training tasks, which have intrinsic correlation and most further pre-training tasks significantly improve certain target tasks rather than all. Our investigation indicates that it is of great importance and effectiveness to design appropriate further pre-training tasks modeling specific information that benefit downstream tasks. Besides, we present multiple constructive empirical conclusions for enhancing task-oriented dialogues.



References used
https://aclanthology.org/
rate research

Read More

The ways cultivation of soils and preparing of soil are for farming field crops with adding fertilizers village as one of the most important methods of modern agriculture processes. Starting up off this importance. the research was executed in the north east area of Homs city, through the season(2013,2014) by using five ways to cultiveate the soil.
Further pre-training language models on in-domain data (domain-adaptive pre-training, DAPT) or task-relevant data (task-adaptive pre-training, TAPT) before fine-tuning has been shown to improve downstream tasks' performances. However, in task-oriente d dialog modeling, we observe that further pre-training MLM does not always boost the performance on a downstream task. We find that DAPT is beneficial in the low-resource setting, but as the fine-tuning data size grows, DAPT becomes less beneficial or even useless, and scaling the size of DAPT data does not help. Through Representational Similarity Analysis, we conclude that more data for fine-tuning yields greater change of the model's representations and thus reduces the influence of initialization.
Low-resource languages can be understood as languages that are more scarce, less studied, less privileged, less commonly taught and for which there are less resources available (Singh, 2008; Cieri et al., 2016; Magueresse et al., 2020). Natural Langu age Processing (NLP) research and technology mainly focuses on those languages for which there are large data sets available. To illustrate differences in data availability: there are 6 million Wikipedia articles available for English, 2 million for Dutch, and merely 82 thousand for Albanian. The scarce data issue becomes increasingly apparent when large parallel data sets are required for applications such as Neural Machine Translation (NMT). In this work, we investigate to what extent translation between Albanian (SQ) and Dutch (NL) is possible comparing a one-to-one (SQ↔AL) model, a low-resource pivot-based approach (English (EN) as pivot) and a zero-shot translation (ZST) (Johnson et al., 2016; Mattoni et al., 2017) system. From our experiments, it results that the EN-pivot-model outperforms both the direct one-to-one and the ZST model. Since often, small amounts of parallel data are available for low-resource languages or settings, experiments were conducted using small sets of parallel NL↔SQ data. The ZST appeared to be the worst performing models. Even when the available parallel data (NL↔SQ) was added, i.e. in a few-shot setting (FST), it remained the worst performing system according to the automatic (BLEU and TER) and human evaluation.
Objectives: This was a prospective study, conducted to analyze the intraoperative and postoperative complications between abdominal and vaginal hysterectomy. METHODS: This study was carried out on 120 patients (85 cases abdominal and 35 cases vagi nal hysterectomy),in the department of gynecology at Al-Assad university hospital in Lattakia in the period between 1/7/2013-1/7/2014. Results: the mean duration of surgery of abdominal hysterectomy was 103 min and that of vaginal was 91 min (p=0.0192). Wound infection was the main cause for febrile morbidity in abdominal hysterectomy group where as urinary tract infection was the main cause for febrile morbidity in vaginal hysterectomy. There was 3)3,5%(case of bladder injury and 2(2,8%) case of ureteric injury in abdominal hysterectomy group while none in vaginal hysterectomy group Postoperatively there was 3 (3,5%) cases of secondary haemorrage in TAH group while 1(2,8%) case in vaginal hysterectomy .there were 8 (9,4%) cases of paralytic ileus in abdominal hysterectomy while none in vaginal hysterectomy . Overall 45 (52.9%) cases of abdominal hysterectomy and 12 (34.2%) case of vaginal hysterectomy had complications (p=0.029). Conclusions: This study showed that vaginal hysterectomy was associated with less intraoperative complications and postoperative complications as compared to abdominal hysterectomy.
Neural machine translation (NMT) models are typically trained using a softmax cross-entropy loss where the softmax distribution is compared against the gold labels. In low-resource scenarios and NMT models tend to perform poorly because the model tra ining quickly converges to a point where the softmax distribution computed using logits approaches the gold label distribution. Although label smoothing is a well-known solution to address this issue and we further propose to divide the logits by a temperature coefficient greater than one and forcing the softmax distribution to be smoother during training. This makes it harder for the model to quickly over-fit. In our experiments on 11 language pairs in the low-resource Asian Language Treebank dataset and we observed significant improvements in translation quality. Our analysis focuses on finding the right balance of label smoothing and softmax tempering which indicates that they are orthogonal methods. Finally and a study of softmax entropies and gradients reveal the impact of our method on the internal behavior of our NMT models.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا