No Arabic abstract
Video contents have become a critical tool for promoting products in E-commerce. However, the lack of automatic promotional video generation solutions makes large-scale video-based promotion campaigns infeasible. The first step of automatically producing promotional videos is to generate visual storylines, which is to select the building block footage and place them in an appropriate order. This task is related to the subjective viewing experience. It is hitherto performed by human experts and thus, hard to scale. To address this problem, we propose WundtBackpack, an algorithmic approach to generate storylines based on available visual materials, which can be video clips or images. It consists of two main parts, 1) the Learnable Wundt Curve to evaluate the perceived persuasiveness based on the stimulus intensity of a sequence of visual materials, which only requires a small volume of data to train; and 2) a clustering-based backpacking algorithm to generate persuasive sequences of visual materials while considering video length constraints. In this way, the proposed approach provides a dynamic structure to empower artificial intelligence (AI) to organize video footage in order to construct a sequence of visual stimuli with persuasive power. Extensive real-world experiments show that our approach achieves close to 10% higher perceived persuasiveness scores by human testers, and 12.5% higher expected revenue compared to the best performing state-of-the-art approach.
Omnidirectional (or 360-degree) images and videos are emergent signals in many areas such as robotics and virtual/augmented reality. In particular, for virtual reality, they allow an immersive experience in which the user is provided with a 360-degree field of view and can navigate throughout a scene, e.g., through the use of Head Mounted Displays. Since it represents the full 360-degree field of view from one point of the scene, omnidirectional content is naturally represented as spherical visual signals. Current approaches for capturing, processing, delivering, and displaying 360-degree content, however, present many open technical challenges and introduce several types of distortions in these visual signals. Some of the distortions are specific to the nature of 360-degree images, and often different from those encountered in the classical image communication framework. This paper provides a first comprehensive review of the most common visual distortions that alter 360-degree signals undergoing state of the art processing in common applications. While their impact on viewers visual perception and on the immersive experience at large is still unknown ---thus, it stays an open research topic--- this review serves the purpose of identifying the main causes of visual distortions in the end-to-end 360-degree content distribution pipeline. It is essential as a basis for benchmarking different processing techniques, allowing the effective design of new algorithms and applications. It is also necessary to the deployment of proper psychovisual studies to characterise the human perception of these new images in interactive and immersive applications.
High-efficiency video coding (HEVC) encryption has been proposed to encrypt syntax elements for the purpose of video encryption. To achieve high video security, to the best of our knowledge, almost all of the existing HEVC encryption algorithms mainly encrypt the whole video, such that the user without permissions cannot obtain any viewable information. However, these encryption algorithms cannot meet the needs of customers who need part of the information but not the full information in the video. In many cases, such as professional paid videos or video meetings, users would like to observe some visible information in the encrypted video of the original video to satisfy their requirements in daily life. Aiming at this demand, this paper proposes a multi-level encryption scheme that is composed of lightweight encryption, medium encryption and heavyweight encryption, where each encryption level can obtain a different amount of visual information. It is found that both encrypting the luma intraprediction model (IPM) and scrambling the syntax element of the DCT coefficient sign can achieve the performance of a distorted video in which there is still residual visual information, while encrypting both of them can implement the intensity of encryption and one cannot gain any visual information. The experimental results meet our expectations appropriately, indicating that there is a different amount of visual information in each encryption level. Meanwhile, users can flexibly choose the encryption level according to their various requirements.
The latest High Efficiency Video Coding (HEVC) standard has been increasingly applied to generate video streams over the Internet. However, HEVC compressed videos may incur severe quality degradation, particularly at low bit-rates. Thus, it is necessary to enhance the visual quality of HEVC videos at the decoder side. To this end, this paper proposes a Quality Enhancement Convolutional Neural Network (QE-CNN) method that does not require any modification of the encoder to achieve quality enhancement for HEVC. In particular, our QE-CNN method learns QE-CNN-I and QE-CNN-P models to reduce the distortion of HEVC I and P frames, respectively. The proposed method differs from the existing CNN-based quality enhancement approaches, which only handle intra-coding distortion and are thus not suitable for P frames. Our experimental results validate that our QE-CNN method is effective in enhancing quality for both I and P frames of HEVC videos. To apply our QE-CNN method in time-constrained scenarios, we further propose a Time-constrained Quality Enhancement Optimization (TQEO) scheme. Our TQEO scheme controls the computational time of QE-CNN to meet a target, meanwhile maximizing the quality enhancement. Next, the experimental results demonstrate the effectiveness of our TQEO scheme from the aspects of time control accuracy and quality enhancement under different time constraints. Finally, we design a prototype to implement our TQEO scheme in a real-time scenario.
Recent years have witnessed an explosion of science conspiracy videos on the Internet, challenging science epistemology and public understanding of science. Scholars have started to examine the persuasion techniques used in conspiracy messages such as uncertainty and fear yet, little is understood about the visual narratives, especially how visual narratives differ in videos that debunk conspiracies versus those that propagate conspiracies. This paper addresses this gap in understanding visual framing in conspiracy videos through analyzing millions of frames from conspiracy and counter-conspiracy YouTube videos using computational methods. We found that conspiracy videos tended to use lower color variance and brightness, especially in thumbnails and earlier parts of the videos. This paper also demonstrates how researchers can integrate textual and visual features for identifying conspiracies on social media and discusses the implications of computational modeling for scholars interested in studying visual manipulation in the digital era.
We develop the optimal economical caching schemes in cache-enabled heterogeneous networks, while delivering multimedia video services with personalized viewing qualities to mobile users. By applying scalable video coding (SVC), each video file to be requested is divided into one base layer (BL) and several enhancement layers (ELs). In order to assign different transmission tasks, the serving small-cell base stations (SBSs) are grouped into K clusters. The SBSs are able to cache and cooperatively transmit BL and EL contents to the user. We analytically derive the expressions for successful transmission probability and ergodic service rate, and then the closed form of EConomical Efficiency (ECE) is obtained. In order to enhance the ECE performance, we formulate the ECE optimization problems for two cases. In the first case, with equal cache size equipped at each SBS, the layer caching indicator is determined. Since this problem is NP-hard, after the l0-norm approximation, the discrete optimization variables are relaxed to be continuous, and this relaxed problem is convex. Next, based on the optimal solution derived from the relaxed problem, we devise a greedystrategy based heuristic algorithm to achieve the near-optimal layer caching indicators. In the second case, the cache size for each SBS, the layer size and the layer caching indicator are jointly optimized. This problem is a mixed integer programming problem, which is more challenging. To effectively solve this problem, the original ECE maximization problem is divided into two subproblems. These two subproblems are iteratively solved until the original optimization problem is convergent. Numerical results verify the correctness of theoretical derivations. Additionally, compared to the most popular layer placement strategy, the performance superiority of the proposed SVC-based caching schemes is testified.