An Estimation of Online Video User Engagement from Features of Continuous Emotions

60 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Lukas Stappen

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Lukas Stappen - Alice Baird - Michelle Lienhart

الوسائط المتعددة الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Portraying emotion and trustworthiness is known to increase the appeal of video content. However, the causal relationship between these signals and online user engagement is not well understood. This limited understanding is partly due to a scarcity in emotionally annotated data and the varied modalities which express user engagement online. In this contribution, we utilise a large dataset of YouTube review videos which includes ca. 600 hours of dimensional arousal, valence and trustworthiness annotations. We investigate features extracted from these signals against various user engagement indicators including views, like/dislike ratio, as well as the sentiment of comments. In doing so, we identify the positive and negative influences which single features have, as well as interpretable patterns in each dimension which relate to user engagement. Our results demonstrate that smaller boundary ranges and fluctuations for arousal lead to an increase in user engagement. Furthermore, the extracted time-series features reveal significant (p<0.05) correlations for each dimension, such as, count below signal mean (arousal), number of peaks (valence), and absolute energy (trustworthiness). From this, an effective combination of features is outlined for approaches aiming to automatically predict several user engagement indicators. In a user engagement prediction paradigm we compare all features against semi-automatic (cross-task), and automatic (task-specific) feature selection methods. These selected feature sets appear to outperform the usage of all features, e.g., using all features achieves 1.55 likes per day (Lp/d) mean absolute error from valence; this improves through semi-automatic and automatic selection to 1.33 and 1.23 Lp/d, respectively (data mean 9.72 Lp/d with a std. 28.75 Lp/d).

قيم البحث

475 - Yuanzhang Xiao , Mihaela van der Schaar 2013

Recent years have seen an explosion in wireless video communication systems. Optimization in such systems is crucial - but most existing methods intended to optimize the performance of multi-user wireless video transmission are inefficient. Some work s (e.g. Network Utility Maximization (NUM)) are myopic: they choose actions to maximize instantaneous video quality while ignoring the future impact of these actions. Such myopic solutions are known to be inferior to foresighted solutions that optimize the long-term video quality. Alternatively, foresighted solutions such as rate-distortion optimized packet scheduling focus on single-user wireless video transmission, while ignoring the resource allocation among the users. In this paper, we propose an optimal solution for performing joint foresighted resource allocation and packet scheduling among multiple users transmitting video over a shared wireless network. A key challenge in developing foresighted solutions for multiple video users is that the users decisions are coupled. To decouple the users decisions, we adopt a novel dual decomposition approach, which differs from the conventional optimization solutions such as NUM, and determines foresighted policies. Specifically, we propose an informationally-decentralized algorithm in which the network manager updates resource prices (i.e. the dual variables associated with the resource constraints), and the users make individual video packet scheduling decisions based on these prices. Because a priori knowledge of the system dynamics is almost never available at run-time, the proposed solution can learn online, concurrently with performing the foresighted optimization. Simulation results show 7 dB and 3 dB improvements in Peak Signal-to-Noise Ratio (PSNR) over myopic solutions and existing foresighted solutions, respectively.

الوسائط المتعددة

Coding local and global binary visual features extracted from video sequences

385 - Luca Baroffio , Antonio Canclini , Matteo Cesana 2015

Binary local features represent an effective alternative to real-valued descriptors, leading to comparable results for many visual analysis tasks, while being characterized by significantly lower computational complexity and memory requirements. When dealing with large collections, a more compact representation based on global features is often preferred, which can be obtained from local features by means of, e.g., the Bag-of-Visual-Word (BoVW) model. Several applications, including for example visual sensor networks and mobile augmented reality, require visual features to be transmitted over a bandwidth-limited network, thus calling for coding techniques that aim at reducing the required bit budget, while attaining a target level of efficiency. In this paper we investigate a coding scheme tailored to both local and global binary features, which aims at exploiting both spatial and temporal redundancy by means of intra- and inter-frame coding. In this respect, the proposed coding scheme can be conveniently adopted to support the Analyze-Then-Compress (ATC) paradigm. That is, visual features are extracted from the acquired content, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast with the traditional approach, in which visual content is acquired at a node, compressed and then sent to a central unit for further processing, according to the Compress-Then-Analyze (CTA) paradigm. In this paper we experimentally compare ATC and CTA by means of rate-efficiency curves in the context of two different visual analysis tasks: homography estimation and content-based retrieval. Our results show that the novel ATC paradigm based on the proposed coding primitives can be competitive with CTA, especially in bandwidth limited scenarios.

الوسائط المتعددة الرؤية الحاسوبية وتمييز الأنماط

Investigating Correlations of Automatically Extracted Multimodal Features and Lecture Video Quality

141 - Jianwei Shi , Christian Otto , Anett Hoppe 2020

Ranking and recommendation of multimedia content such as videos is usually realized with respect to the relevance to a user query. However, for lecture videos and MOOCs (Massive Open Online Courses) it is not only required to retrieve relevant videos , but particularly to find lecture videos of high quality that facilitate learning, for instance, independent of the videos or speakers popularity. Thus, metadata about a lecture videos quality are crucial features for learning contexts, e.g., lecture video recommendation in search as learning scenarios. In this paper, we investigate whether automatically extracted features are correlated to quality aspects of a video. A set of scholarly videos from a Mass Open Online Course (MOOC) is analyzed regarding audio, linguistic, and visual features. Furthermore, a set of cross-modal features is proposed which are derived by combining transcripts, audio, video, and slide content. A user study is conducted to investigate the correlations between the automatically collected features and human ratings of quality aspects of a lecture video. Finally, the impact of our features on the knowledge gain of the participants is discussed.

الوسائط المتعددة

Multi-feature 360 Video Quality Estimation

345 - Roberto G. de A. Azevedo , Neil Birkbeck , Ivan Janatra 2021

We propose a new method for the visual quality assessment of 360-degree (omnidirectional) videos. The proposed method is based on computing multiple spatio-temporal objective quality features on viewports extracted from 360-degree videos. A new model is learnt to properly combine these features into a metric that closely matches subjective quality scores. The main motivations for the proposed approach are that: 1) quality metrics computed on viewports better captures the user experience than metrics computed on the projection domain; 2) the use of viewports easily supports different projection methods being used in current 360-degree video systems; and 3) no individual objective image quality metric always performs the best for all types of visual distortions, while a learned combination of them is able to adapt to different conditions. Experimental results, based on both the largest available 360-degree videos quality dataset and a cross-dataset validation, demonstrate that the proposed metric outperforms state-of-the-art 360-degree and 2D video quality metrics.

الوسائط المتعددة

A Systematic Framework for Dynamically Optimizing Multi-User Wireless Video Transmission

433 - Fangwen Fu , Mihaela van der Schaar 2009

In this paper, we formulate the collaborative multi-user wireless video transmission problem as a multi-user Markov decision process (MUMDP) by explicitly considering the users heterogeneous video traffic characteristics, time-varying network conditi ons and the resulting dynamic coupling between the wireless users. These environment dynamics are often ignored in existing multi-user video transmission solutions. To comply with the decentralized nature of wireless networks, we propose to decompose the MUMDP into local MDPs using Lagrangian relaxation. Unlike in conventional multi-user video transmission solutions stemming from the network utility maximization framework, the proposed decomposition enables each wireless user to individually solve its own dynamic cross-layer optimization (i.e. the local MDP) and the network coordinator to update the Lagrangian multipliers (i.e. resource prices) based on not only current, but also future resource needs of all users, such that the long-term video quality of all users is maximized. However, solving the MUMDP requires statistical knowledge of the experienced environment dynamics, which is often unavailable before transmission time. To overcome this obstacle, we then propose a novel online learning algorithm, which allows the wireless users to update their policies in multiple states during one time slot. This is different from conventional learning solutions, which often update one state per time slot. The proposed learning algorithm can significantly improve the learning performance, thereby dramatically improving the video quality experienced by the wireless users over time. Our simulation results demonstrate the efficiency of the proposed MUMDP framework as compared to conventional multi-user video transmission solutions.

الوسائط المتعددة