بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Look whos not talking

242 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Joon Son Chung

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية هندسة إلكترونية

والبحث باللغة English

تأليف Youngki Kwon - Hee Soo Heo - Jaesung Huh

أنظمة الصوت في الحاسوب معالجة الصوت والكلام

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The objective of this work is speaker diarisation of speech recordings in the wild. The ability to determine speech segments is a crucial part of diarisation systems, accounting for a large proportion of errors. In this paper, we present a simple but effective solution for speech activity detection based on the speaker embeddings. In particular, we discover that the norm of the speaker embedding is an extremely effective indicator of speech activity. The method does not require an independent model for speech activity detection, therefore allows speaker diarisation to be performed using a unified representation for both speaker modelling and speech activity detection. We perform a number of experiments on in-house and public datasets, in which our method outperforms popular baselines.

قيم البحث

75 - You Jin Kim , Hee-Soo Heo , Soyeon Choe 2021

In this work, we present a novel audio-visual dataset for active speaker detection in the wild. A speaker is considered active when his or her face is visible and the voice is audible simultaneously. Although active speaker detection is a crucial pre -processing step for many audio-visual tasks, there is no existing dataset of natural human speech to evaluate the performance of active speaker detection. We therefore curate the Active Speakers in the Wild (ASW) dataset which contains videos and co-occurring speech segments with dense speech activity labels. Videos and timestamps of audible segments are parsed and adopted from VoxConverse, an existing speaker diarisation dataset that consists of videos in the wild. Face tracks are extracted from the videos and active segments are annotated based on the timestamps of VoxConverse in a semi-automatic way. Two reference systems, a self-supervised system and a fully supervised one, are evaluated on the dataset to provide the baseline performances of ASW. Cross-domain evaluation is conducted in order to show the negative effect of dubbed videos in the training data.

الرؤية الحاسوبية وتمييز الأنماط أنظمة الصوت في الحاسوب معالجة الصوت والكلام

Look Whos Talking: Inferring Speaker Attributes from Personal Longitudinal Dialog

85 - Charles Welch , Veronica Perez-Rosas , Jonathan K. Kummerfeld 2019

We examine a large dialog corpus obtained from the conversation history of a single individual with 104 conversation partners. The corpus consists of half a million instant messages, across several messaging platforms. We focus our analyses on seven speaker attributes, each of which partitions the set of speakers, namely: gender; relative age; family member; romantic partner; classmate; co-worker; and native to the same country. In addition to the content of the messages, we examine conversational aspects such as the time messages are sent, messaging frequency, psycholinguistic word categories, linguistic mirroring, and graph-based features reflecting how people in the corpus mention each other. We present two sets of experiments predicting each attribute using (1) short context windows; and (2) a larger set of messages. We find that using all features leads to gains of 9-14% over using message text only.

الحساب واللغة الذكاء الاصطناعي

Look Whos Talking: Interpretable Machine Learning for Assessing Italian SMEs Credit Default

67 - Lisa Crosato , Caterina Liberati , Marco Repetto 2021

Academic research and the financial industry have recently paid great attention to Machine Learning algorithms due to their power to solve complex learning tasks. In the field of firms default prediction, however, the lack of interpretability has pre vented the extensive adoption of the black-box type of models. To overcome this drawback and maintain the high performances of black-boxes, this paper relies on a model-agnostic approach. Accumulated Local Effects and Shapley values are used to shape the predictors impact on the likelihood of default and rank them according to their contribution to the model outcome. Prediction is achieved by two Machine Learning algorithms (eXtreme Gradient Boosting and FeedForward Neural Network) compared with three standard discriminant models. Results show that our analysis of the Italian Small and Medium Enterprises manufacturing industry benefits from the overall highest classification power by the eXtreme Gradient Boosting algorithm without giving up a rich interpretation framework.

التعلم الالي التعلم الآلي الاقتصاد القياسي

Look Whos Talking Now: Implications of AVs Explanations on Drivers Trust, AV Preference, Anxiety and Mental Workload

174 - Na Du , Jacob Haspiel , Qiaoning Zhang 2019

Explanations given by automation are often used to promote automation adoption. However, it remains unclear whether explanations promote acceptance of automated vehicles (AVs). In this study, we conducted a within-subject experiment in a driving simu lator with 32 participants, using four different conditions. The four conditions included: (1) no explanation, (2) explanation given before or (3) after the AV acted and (4) the option for the driver to approve or disapprove the AVs action after hearing the explanation. We examined four AV outcomes: trust, preference for AV, anxiety and mental workload. Results suggest that explanations provided before an AV acted were associated with higher trust in and preference for the AV, but there was no difference in anxiety and workload. These results have important implications for the adoption of AVs.

تفاعل الإنسان والحاسوب أجهزة الكمبيوتر والمجتمع علم الروبوتات

Whos talking first? Consensus or lack thereof in coevolving opinion formation models

353 - Cecilia Nardini 2007

We investigate different opinion formation models on adaptive network topologies. Depending on the dynamical process, rewiring can either (i) lead to the elimination of interactions between agents in different states, and accelerate the convergence t o a consensus state or break the network in non-interacting groups or (ii) counter-intuitively, favor the existence of diverse interacting groups for exponentially long times. The mean-field analysis allows to elucidate the mechanisms at play. Strikingly, allowing the interacting agents to bear more than one opinion at the same time drastically changes the models behavior and leads to fast consensus.

الفيزياء والمجتمع

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة دمشق

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Look whos not talking

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً