ترغب بنشر مسار تعليمي؟ اضغط هنا

Automatic Generation of High Quality CCGbanks for Parser Domain Adaptation

136   0   0.0 ( 0 )
 نشر من قبل Masashi Yoshikawa
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a new domain adaptation method for Combinatory Categorial Grammar (CCG) parsing, based on the idea of automatic generation of CCG corpora exploiting cheaper resources of dependency trees. Our solution is conceptually simple, and not relying on a specific parser architecture, making it applicable to the current best-performing parsers. We conduct extensive parsing experiments with detailed discussion; on top of existing benchmark datasets on (1) biomedical texts and (2) question sentences, we create experimental datasets of (3) speech conversation and (4) math problems. When applied to the proposed method, an off-the-shelf CCG parser shows significant performance gains, improving from 90.7% to 96.6% on speech conversation, and from 88.5% to 96.8% on math problems.



قيم البحث

اقرأ أيضاً

We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the enco der and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering score, which avoids the need for a separate filtering model. Our generator is trained by fine-tuning a pretrained LM using maximum likelihood estimation. The experimental results indicate significant improvements in the domain adaptation of QA models outperforming current state-of-the-art methods.
Recently, semantic parsing has attracted much attention in the community. Although many neural modeling efforts have greatly improved the performance, it still suffers from the data scarcity issue. In this paper, we propose a novel semantic parser fo r domain adaptation, where we have much fewer annotated data in the target domain compared to the source domain. Our semantic parser benefits from a two-stage coarse-to-fine framework, thus can provide different and accurate treatments for the two stages, i.e., focusing on domain invariant and domain specific information, respectively. In the coarse stage, our novel domain discrimination component and domain relevance attention encourage the model to learn transferable domain general structures. In the fine stage, the model is guided to concentrate on domain related details. Experiments on a benchmark dataset show that our method consistently outperforms several popular domain adaptation strategies. Additionally, we show that our model can well exploit limited target data to capture the difference between the source and target domain, even when the target domain has far fewer training instances.
This paper is focused on the finetuning of acoustic models for speaker adaptation goals on a given gender. We pretrained the Transformer baseline model on Librispeech-960 and conduct experiments with finetuning on the gender-specific test subsets and . In general, we do not obtain essential WER reduction by finetuning techniques by this approach. We achieved up to ~5% lower word error rate on the male subset and 3% on the female subset if the layers in the encoder and decoder are not frozen, but the tuning is started from the last checkpoints. Moreover, we adapted our base model on the full L2 Arctic dataset of accented speech and fine-tuned it for particular speakers and male and female genders separately. The models trained on the gender subsets obtained 1-2% higher accuracy when compared to the model tuned on the whole L2 Arctic dataset. Finally, we tested the concatenation of the pretrained x-vector voice embeddings and embeddings from a conventional encoder, but its gain in accuracy is not significant.
The cardiothoracic ratio (CTR), a clinical metric of heart size in chest X-rays (CXRs), is a key indicator of cardiomegaly. Manual measurement of CTR is time-consuming and can be affected by human subjectivity, making it desirable to design computer- aided systems that assist clinicians in the diagnosis process. Automatic CTR estimation through chest organ segmentation, however, requires large amounts of pixel-level annotated data, which is often unavailable. To alleviate this problem, we propose an unsupervised domain adaptation framework based on adversarial networks. The framework learns domain invariant feature representations from openly available data sources to produce accurate chest organ segmentation for unlabeled datasets. Specifically, we propose a model that enforces our intuition that prediction masks should be domain independent. Hence, we introduce a discriminator that distinguishes segmentation predictions from ground truth masks. We evaluate our systems prediction based on the assessment of radiologists and demonstrate the clinical practicability for the diagnosis of cardiomegaly. We finally illustrate on the JSRT dataset that the semi-supervised performance of our model is also very promising.
Neural natural language generation (NLG) and understanding (NLU) models are data-hungry and require massive amounts of annotated data to be competitive. Recent frameworks address this bottleneck with generative models that synthesize weak labels at s cale, where a small amount of training labels are expert-curated and the rest of the data is automatically annotated. We follow that approach, by automatically constructing a large-scale weakly-labeled data with a fine-tuned GPT-2, and employ a semi-supervised framework to jointly train the NLG and NLU models. The proposed framework adapts the parameter updates to the models according to the estimated label-quality. On both the E2E and Weather benchmarks, we show that this weakly supervised training paradigm is an effective approach under low resource scenarios and outperforming benchmark systems on both datasets when 100% of training data is used.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا