Serial Speakers: a Dataset of TV Series

200 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xavier Bost

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xavier Bost

استرجاع المعلومات الوسائط المتعددة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

For over a decade, TV series have been drawing increasing interest, both from the audience and from various academic fields. But while most viewers are hooked on the continuous plots of TV serials, the few annotated datasets available to researchers focus on standalone episodes of classical TV series. We aim at filling this gap by providing the multimedia/speech processing communities with Serial Speakers, an annotated dataset of 161 episodes from three popular American TV serials: Breaking Bad, Game of Thrones and House of Cards. Serial Speakers is suitable both for investigating multimedia retrieval in realistic use case scenarios, and for addressing lower level speech related tasks in especially challenging conditions. We publicly release annotations for every speech turn (boundaries, speaker) and scene boundary, along with annotations for shot boundaries, recurring shots, and interacting speakers in a subset of episodes. Because of copyright restrictions, the textual content of the speech turns is encrypted in the public version of the dataset, but we provide the users with a simple online tool to recover the plain text from their own subtitle files.

قيم البحث

184 - Xavier Bost 2018

Speaker diarization may be difficult to achieve when applied to narrative films, where speakers usually talk in adverse acoustic conditions: background music, sound effects, wide variations in intonation may hide the inter-speaker variability and mak e audio-based speaker diarization approaches error prone. On the other hand, such fictional movies exhibit strong regularities at the image level, particularly within dialogue scenes. In this paper, we propose to perform speaker diarization within dialogue scenes of TV series by combining the audio and video modalities: speaker diarization is first performed by using each modality, the two resulting partitions of the instance set are then optimally matched, before the remaining instances, corresponding to cases of disagreement between both modalities, are finally processed. The results obtained by applying such a multi-modal approach to fictional films turn out to outperform those obtained by relying on a single modality.

الوسائط المتعددة الحساب واللغة

Analyzing Who and What Appears in a Decade of US Cable TV News

103 - James Hong , Will Crichton , Haotian Zhang 2020

Cable TV news reaches millions of U.S. households each day, meaning that decisions about who appears on the news and what stories get covered can profoundly influence public opinion and discourse. We analyze a data set of nearly 24/7 video, audio, an d text captions from three U.S. cable TV networks (CNN, FOX, and MSNBC) from January 2010 to July 2019. Using machine learning tools, we detect faces in 244,038 hours of video, label each faces presented gender, identify prominent public figures, and align text captions to audio. We use these labels to perform screen time and word frequency analyses. For example, we find that overall, much more screen time is given to male-presenting individuals than to female-presenting individuals (2.4x in 2010 and 1.9x in 2019). We present an interactive web-based tool, accessible at https://tvnews.stanford.edu, that allows the general public to perform their own analyses on the full cable TV news data set.

أجهزة الكمبيوتر والمجتمع الوسائط المتعددة

A Crowdsourcing Procedure for the Discovery of Non-Obvious Attributes of Social Image

492 - Mark Melenhorst Delft University of Technology 2014

Research on mid-level image representations has conventionally concentrated relatively obvious attributes and overlooked non-obvious attributes, i.e., characteristics that are not readily observable when images are viewed independently of their conte xt or function. Non-obvious attributes are not necessarily easily nameable, but nonetheless they play a systematic role in people`s interpretation of images. Clusters of related non-obvious attributes, called interpretation dimensions, emerge when people are asked to compare images, and provide important insight on aspects of social images that are considered relevant. In contrast to aesthetic or affective approaches to image analysis, non-obvious attributes are not related to the personal perspective of the viewer. Instead, they encode a conventional understanding of the world, which is tacit, rather than explicitly expressed. This paper introduces a procedure for discovering non-obvious attributes using crowdsourcing. We discuss this procedure using a concrete example of a crowdsourcing task on Amazon Mechanical Turk carried out in the domain of fashion. An analysis comparing discovered non-obvious attributes with user tags demonstrated the added value delivered by our procedure.

استرجاع المعلومات الوسائط المتعددة

A Surrogate-based Generic Classifier for Chinese TV Series Reviews

257 - Yufeng Ma , Long Xia , Wenqi Shen 2016

With the emerging of various online video platforms like Youtube, Youku and LeTV, online TV series reviews become more and more important both for viewers and producers. Customers rely heavily on these reviews before selecting TV series, while produc ers use them to improve the quality. As a result, automatically classifying reviews according to different requirements evolves as a popular research topic and is essential in our daily life. In this paper, we focused on reviews of hot TV series in China and successfully trained generic classifiers based on eight predefined categories. The experimental results showed promising performance and effectiveness of its generalization to different TV series.

الحساب واللغة

Personalized TV Recommendation: Fusing User Behavior and Preferences

95 - Sheng-Chieh Lin , Ting-Wei Lin , Jing-Kai Lou 2020

In this paper, we propose a two-stage ranking approach for recommending linear TV programs. The proposed approach first leverages user viewing patterns regarding time and TV channels to identify potential candidates for recommendation and then furthe r leverages user preferences to rank these candidates given textual information about programs. To evaluate the method, we conduct empirical studies on a real-world TV dataset, the results of which demonstrate the superior performance of our model in terms of both recommendation accuracy and time efficiency.

استرجاع المعلومات التعلم الآلي التعلم الالي