New community

Subscribe to the gold package and get unlimited access to Shamra Academy

New Domain, Major Effort? How Much Data is Necessary to Adapt a Temporal Tagger to the Voice Assistant Domain

مجال جديد، جهد كبير؟ما مقدار البيانات الضرورية لتكييف علامة زمنية إلى مجال مساعد الصوت

323 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Reliable tagging of Temporal Expressions (TEs, e.g., Book a table at L'Osteria for Sunday evening) is a central requirement for Voice Assistants (VAs). However, there is a dearth of resources and systems for the VA domain, since publicly-available temporal taggers are trained only on substantially different domains, such as news and clinical text. Since the cost of annotating large datasets is prohibitive, we investigate the trade-off between in-domain data and performance in DA-Time, a hybrid temporal tagger for the English VA domain which combines a neural architecture for robust TE recognition, with a parser-based TE normalizer. We find that transfer learning goes a long way even with as little as 25 in-domain sentences: DA-Time performs at the state of the art on the news domain, and substantially outperforms it on the VA domain.

References used

https://aclanthology.org/

rate research

How much pretraining data do language models need to learn syntax?

560 - Association for Computation Linguistics 2021 مقالة

Transformers-based pretrained language models achieve outstanding results in many well-known NLU benchmarks. However, while pretraining methods are very convenient, they are expensive in terms of time and resources. This calls for a study of the impa ct of pretraining data size on the knowledge of the models. We explore this impact on the syntactic capabilities of RoBERTa, using models trained on incremental sizes of raw text data. First, we use syntactic structural probes to determine whether models pretrained on more data encode a higher amount of syntactic information. Second, we perform a targeted syntactic evaluation to analyze the impact of pretraining data size on the syntactic generalization performance of the models. Third, we compare the performance of the different models on three downstream applications: part-of-speech tagging, dependency parsing and paraphrase identification. We complement our study with an analysis of the cost-benefit trade-off of training such models. Our experiments show that while models pretrained on more data encode more syntactic knowledge and perform better on downstream applications, they do not always offer a better performance across the different syntactic phenomena and come at a higher financial and environmental cost.

learn syntax pretraining data size تعلم بناء الجملة احتجاج حجم البيانات صناعة حمض الفوسفور

Generation of Synthetic Time Histories Functions Compatible with Syrian Response Spectra in Frequency Domain and Time Domain Applicable to Dynamic Analysis

2050 - Aِl-Baath University 2016 ورقة بحثية

In this study, basic methodologies and procedures for generation synthetic time histories in time domain and frequency domain are summarized. These synthetic time histories are matching Syrian spectrum and compatible with wide range of buildings m odels and soil types according to the seismic parameters of Lattakia city. This paper will discuss the Selection and scaling criteria of three real time history records available in strong ground motion databases to satisfy the Syrian spectrum, and the suitability as input to time history analysis of civil engineering structures.

Time domain تقييس السجلات اختيار السجلات مجال الزمن Scaling selection سجلات التسارع مجال التردد التطابق الطيفي Accelerograms Frequency domain Spectrum matching المزيد..

Description of the data in the university domain using semantic web technologies

1914 - Tishreen University 2017 ورقة بحثية

In the few recent years, besides the traditional web a new web has appeared. It is called the Web of Linked Data. It has been developed to present data in a machinereadable form. The main idea is to describe data using a set of terms called web ont ology. At this time, tools and standards related to the semantic web are becoming comprehensive and stable; however, publishing university data as linked data still faces some major challenges. First of all, there is no unified, well-accepted vocabulary for describing university-related information. This article aims to find the ontology which could be used to describe the data in the university domain, so it could be possible to integrate this data with data from other universities and do queries on it. The web ontology was built by reusing the available vocabularies on the web and adding new classes and properties. The ontology has been organized by using Protégé.

semantic web ويب دلالي بيانات مترابطة انطولوجيا ويب ويب البيانات معاجم Linked Data Web Ontology Web of Data Vocabularies المزيد..

DART: Open-Domain Structured Data Record to Text Generation

411 - Association for Computation Linguistics 2021 مقالة

We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploiting the semantic dependencies among table headers and the table title. Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and spoken dialogue systems by utilizing techniques including tree ontology annotation, question-answer pair to declarative sentence conversion, and predicate unification, all with minimum post-editing. We present systematic evaluation on DART as well as new state-of-the-art results on WebNLG 2017 to show that DART (1) poses new challenges to existing data-to-text datasets and (2) facilitates out-of-domain generalization. Our data and code can be found at https://github.com/Yale-LILY/dart.

structured data record record to text سجل البيانات الهيكلية سجل إلى النص صناعة حمض الفوسفور

Bootstrapping a Music Voice Assistant with Weak Supervision

573 - Association for Computation Linguistics 2021 مقالة

One of the first building blocks to create a voice assistant relates to the task of tagging entities or attributes in user queries. This can be particularly challenging when entities are in the tenth of millions, as is the case of e.g. music catalogs . Training slot tagging models at an industrial scale requires large quantities of accurately labeled user queries, which are often hard and costly to gather. On the other hand, voice assistants typically collect plenty of unlabeled queries that often remain unexploited. This paper presents a weakly-supervised methodology to label large amounts of voice query logs, enhanced with a manual filtering step. Our experimental evaluations show that slot tagging models trained on weakly-supervised data outperform models trained on hand-annotated or synthetic data, at a lower cost. Further, manual filtering of weakly-supervised data leads to a very significant reduction in Sentence Error Rate, while allowing us to drastically reduce human curation efforts from weeks to hours, with respect to hand-annotation of queries. The method is applied to successfully bootstrap a slot tagging system for a major music streaming service that currently serves several tens of thousands of daily voice queries.

تحسين nlu reranking. music voice assistant مساعد صوت الموسيقى مساعد الصوت صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

New Domain, Major Effort? How Much Data is Necessary to Adapt a Temporal Tagger to the Voice Assistant Domain

مجال جديد، جهد كبير؟ما مقدار البيانات الضرورية لتكييف علامة زمنية إلى مجال مساعد الصوت

Ask ChatGPT about the research

Read More

suggested questions