No Arabic abstract
Video watermarking embeds a message into a cover video in an imperceptible manner, which can be retrieved even if the video undergoes certain modifications or distortions. Traditional watermarking methods are often manually designed for particular types of distortions and thus cannot simultaneously handle a broad spectrum of distortions. To this end, we propose a robust deep learning-based solution for video watermarking that is end-to-end trainable. Our model consists of a novel multiscale design where the watermarks are distributed across multiple spatial-temporal scales. It gains robustness against various distortions through a differentiable distortion layer, whereas non-differentiable distortions, such as popular video compression standards, are modeled by a differentiable proxy. Extensive evaluations on a wide variety of distortions show that our method outperforms traditional video watermarking methods as well as deep image watermarking models by a large margin. We further demonstrate the practicality of our method on a realistic video-editing application.
Due to the rapid growth of machine learning tools and specifically deep networks in various computer vision and image processing areas, application of Convolutional Neural Networks for watermarking have recently emerged. In this paper, we propose a deep end-to-end diffusion watermarking framework (ReDMark) which can be adapted for any desired transform space. The framework is composed of two Fully Convolutional Neural Networks with the residual structure for embedding and extraction. The whole deep network is trained end-to-end to conduct a blind secure watermarking. The framework is customizable for the level of robustness vs. imperceptibility. It is also adjustable for the trade-off between capacity and robustness. The proposed framework simulates various attacks as a differentiable network layer to facilitate end-to-end training. For JPEG attack, a differentiable approximation is utilized, which drastically improves the watermarking robustness to this attack. Another important characteristic of the proposed framework, which leads to improved security and robustness, is its capability to diffuse watermark information among a relatively wide area of the image. Comparative results versus recent state-of-the-art researches highlight the superiority of the proposed framework in terms of imperceptibility and robustness.
The advancement in digital technologies have made it possible to produce perfect copies of digital content. In this environment, malicious users reproduce the digital content and share it without compensation to the content owner. Content owners are concerned about the potential loss of revenue and reputation from piracy, especially when the content is available over the Internet. Digital watermarking has emerged as a deterrent measure towards such malicious activities. Several methods have been proposed for copyright protection and fingerprinting of digital images. However, these methods are not applicable to text documents as these documents lack rich texture information which is abundantly available in digital images. In this paper, a framework (mPDF) is proposed which facilitates the usage of digital image watermarking algorithms on text documents. The proposed method divides a text document into texture and non-texture blocks using an energy-based approach. After classification, a watermark is embedded inside the texture blocks in a content adaptive manner. The proposed method is integrated with five known image watermarking methods and its performance is studied in terms of quality and robustness. Experiments are conducted on documents in 11 different languages. Experimental results clearly show that the proposed method facilitates the usage of image watermarking algorithms on text documents and is robust against attacks such as print & scan, print screen, and skew. Also, the proposed method overcomes the drawbacks of existing text watermarking methods such as manual inspection and language dependency.
Digital image watermarking is the process of embedding and extracting watermark covertly on a carrier image. Incorporating deep learning networks with image watermarking has attracted increasing attention during recent years. However, existing deep learning-based watermarking systems cannot achieve robustness, blindness, and automated embedding and extraction simultaneously. In this paper, a fully automated image watermarking system based on deep neural networks is proposed to generalize the image watermarking processes. An unsupervised deep learning structure and a novel loss computation are proposed to achieve high capacity and high robustness without any prior knowledge of possible attacks. Furthermore, a challenging application of watermark extraction from camera-captured images is provided to validate the practicality as well as the robustness of the proposed system. Experimental results show the superiority performance of the proposed system as comparing against several currently available techniques.
Motivated by the prowess of deep learning (DL) based techniques in prediction, generalization, and representation learning, we develop a novel framework called DeepQoE to predict video quality of experience (QoE). The end-to-end framework first uses a combination of DL techniques (e.g., word embeddings) to extract generalized features. Next, these features are combined and fed into a neural network for representation learning. Such representations serve as inputs for classification or regression tasks. Evaluating the performance of DeepQoE with two datasets, we show that for the small dataset, the accuracy of all shallow learning algorithm is improved by using the representation derived from DeepQoE. For the large dataset, our DeepQoE framework achieves significant performance improvement in comparison to the best baseline method (90.94% vs. 82.84%). Moreover, DeepQoE, also released as an open source tool, provides video QoE research much-needed flexibility in fitting different datasets, extracting generalized features, and learning representations.
In this paper, we formulate the collaborative multi-user wireless video transmission problem as a multi-user Markov decision process (MUMDP) by explicitly considering the users heterogeneous video traffic characteristics, time-varying network conditions and the resulting dynamic coupling between the wireless users. These environment dynamics are often ignored in existing multi-user video transmission solutions. To comply with the decentralized nature of wireless networks, we propose to decompose the MUMDP into local MDPs using Lagrangian relaxation. Unlike in conventional multi-user video transmission solutions stemming from the network utility maximization framework, the proposed decomposition enables each wireless user to individually solve its own dynamic cross-layer optimization (i.e. the local MDP) and the network coordinator to update the Lagrangian multipliers (i.e. resource prices) based on not only current, but also future resource needs of all users, such that the long-term video quality of all users is maximized. However, solving the MUMDP requires statistical knowledge of the experienced environment dynamics, which is often unavailable before transmission time. To overcome this obstacle, we then propose a novel online learning algorithm, which allows the wireless users to update their policies in multiple states during one time slot. This is different from conventional learning solutions, which often update one state per time slot. The proposed learning algorithm can significantly improve the learning performance, thereby dramatically improving the video quality experienced by the wireless users over time. Our simulation results demonstrate the efficiency of the proposed MUMDP framework as compared to conventional multi-user video transmission solutions.