No Arabic abstract
The frequent exchange of multimedia information in the present era projects an increasing demand for copyright protection. In this work, we propose a novel audio zero-watermarking technology based on graph Fourier transform for enhancing the robustness with respect to copyright protection. In this approach, the combined shift operator is used to construct the graph signal, upon which the graph Fourier analysis is performed. The selected maximum absolute graph Fourier coefficients representing the characteristics of the audio segment are then encoded into a feature binary sequence using K-means algorithm. Finally, the resultant feature binary sequence is XOR-ed with the watermark binary sequence to realize the embedding of the zero-watermarking. The experimental studies show that the proposed approach performs more effectively in resisting common or synchronization attacks than the existing state-of-the-art methods.
Reversible visible watermarking (RVW) is an active copyright protection mechanism. It not only transparently superimposes copyright patterns on specific positions of digital images or video frames to declare the copyright ownership information, but also completely erases the visible watermark image and thus enables restoring the original host image without any distortion. However, existing RVW algorithms mostly construct the reversible mapping mechanism for a specific visible watermarking scheme, which is not general. Hence, we propose a generic RVW framework to accommodate various visible watermarking schemes, which is based on Regularized Graph Fourier Transform (GFT) coding. In particular, we obtain a reconstruction data packet -- the compressed difference image between the watermarked image and the original host image, which is embedded into the watermarked image via any conventional reversible data hiding method to facilitate the blind recovery of the host image. The key is to achieve compact compression of the difference image for efficient embedding of the reconstruction data packet. To this end, we propose regularized GFT coding, where the difference image is smoothed via the graph Laplacian regularizer for more efficient compression and then encoded by multi-resolution GFTs in an approximately optimal manner. Experimental results show that the proposed method achieves the state-of-the-art performance with high data compression efficiency, which is applicable to both gray-scale and color images. In addition, the proposed generic framework accommodates various visible watermarking algorithms, which demonstrates strong versatility.
As an important component of multimedia analysis tasks, audio classification aims to discriminate between different audio signal types and has received intensive attention due to its wide applications. Generally speaking, the raw signal can be transformed into various representations (such as Short Time Fourier Transform and Mel Frequency Cepstral Coefficients), and information implied in different representations can be complementary. Ensembling the models trained on different representations can greatly boost the classification performance, however, making inference using a large number of models is cumbersome and computationally expensive. In this paper, we propose a novel end-to-end collaborative learning framework for the audio classification task. The framework takes multiple representations as the input to train the models in parallel. The complementary information provided by different representations is shared by knowledge distillation. Consequently, the performance of each model can be significantly promoted without increasing the computational overhead in the inference stage. Extensive experimental results demonstrate that the proposed approach can improve the classification performance and achieve state-of-the-art results on both acoustic scene classification tasks and general audio tagging tasks.
Steganography comprises the mechanics of hiding data in a host media that may be publicly available. While previous works focused on unimodal setups (e.g., hiding images in images, or hiding audio in audio), PixInWav targets the multimodal case of hiding images in audio. To this end, we propose a novel residual architecture operating on top of short-time discrete cosine transform (STDCT) audio spectrograms. Among our results, we find that the residual audio steganography setup we propose allows independent encoding of the hidden image from the host audio without compromising quality. Accordingly, while previous works require both host and hidden signals to hide a signal, PixInWav can encode images offline -- which can be later hidden, in a residual fashion, into any audio signal. Finally, we test our scheme in a lab setting to transmit images over airwaves from a loudspeaker to a microphone verifying our theoretical insights and obtaining promising results.
Digital watermarks have been considered a promising way to fight software piracy. Graph-based watermarking schemes encode authorship/ownership data as control-flow graph of dummy code. In 2012, Chroni and Nikolopoulos developed an ingenious such scheme which was claimed to withstand attacks in the form of a single edge removal. We extend the work of those authors in various aspects. First, we give a formal characterization of the class of graphs generated by their encoding function. Then, we formulate a linear-time algorithm which recovers from ill-intentioned removals of $k leq 2$ edges, therefore proving their claim. Furthermore, we provide a simpler decoding function and an algorithm to restore watermarks with an arbitrary number of missing edges whenever at all possible. By disclosing and improving upon the resilience of Chroni and Nikolopouloss watermark, our results reinforce the interest in regarding it as a possible solution to numerous applications.
In this paper, we redefine the Graph Fourier Transform (GFT) under the DSP$_mathrm{G}$ framework. We consider the Jordan eigenvectors of the directed Laplacian as graph harmonics and the corresponding eigenvalues as the graph frequencies. For this purpose, we propose a shift operator based on the directed Laplacian of a graph. Based on our shift operator, we then define total variation of graph signals, which is used in frequency ordering. We achieve natural frequency ordering and interpretation via the proposed definition of GFT. Moreover, we show that our proposed shift operator makes the LSI filters under DSP$_mathrm{G}$ to become polynomial in the directed Laplacian.