Neural Linguistic Steganography

68 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zachary Ziegler

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zachary M. Ziegler - Yuntian Deng - Alexander M. Rush

الحساب واللغة التشفير والأمن التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Whereas traditional cryptography encrypts a secret message into an unintelligible form, steganography conceals that communication is taking place by encoding a secret message into a cover signal. Language is a particularly pragmatic cover signal due to its benign occurrence and independence from any one medium. Traditionally, linguistic steganography systems encode secret messages in existing text via synonym substitution or word order rearrangements. Advances in neural language models enable previously impractical generation-based techniques. We propose a steganography technique based on arithmetic coding with large-scale neural language models. We find that our approach can generate realistic looking cover sentences as evaluated by humans, while at the same time preserving security by matching the cover message distribution with the language model distribution.

قيم البحث

214 - Jiaming Shen , Heng Ji , Jiawei Han 2020

Linguistic steganography studies how to hide secret messages in natural language cover texts. Traditional methods aim to transform a secret message into an innocent text via lexical substitution or syntactical modification. Recently, advances in neur al language models (LMs) enable us to directly generate cover text conditioned on the secret message. In this study, we present a new linguistic steganography method which encodes secret messages using self-adjusting arithmetic coding based on a neural language model. We formally analyze the statistical imperceptibility of this method and empirically show it outperforms the previous state-of-the-art methods on four datasets by 15.3% and 38.9% in terms of bits/word and KL metrics, respectively. Finally, human evaluations show that 51% of generated cover texts can indeed fool eavesdroppers.

الحساب واللغة التشفير والأمن

Provably Secure Generative Linguistic Steganography

119 - Siyu Zhang , Zhongliang Yang , Jinshuai Yang 2021

Generative linguistic steganography mainly utilized language models and applied steganographic sampling (stegosampling) to generate high-security steganographic text (stegotext). However, previous methods generally lead to statistical differences bet ween the conditional probability distributions of stegotext and natural text, which brings about security risks. In this paper, to further ensure security, we present a novel provably secure generative linguistic steganographic method ADG, which recursively embeds secret information by Adaptive Dynamic Grouping of tokens according to their probability given by an off-the-shelf language model. We not only prove the security of ADG mathematically, but also conduct extensive experiments on three public corpora to further verify its imperceptibility. The experimental results reveal that the proposed method is able to generate stegotext with nearly perfect security.

الحساب واللغة التشفير والأمن

Natural Steganography: cover-source switching for better steganography

334 - Patrick Bas 2016

This paper proposes a new steganographic scheme relying on the principle of cover-source switching, the key idea being that the embedding should switch from one cover-source to another. The proposed implementation, called Natural Steganography, consi ders the sensor noise naturally present in the raw images and uses the principle that, by the addition of a specific noise the steganographic embedding tries to mimic a change of ISO sensitivity. The embedding methodology consists in 1) perturbing the image in the raw domain, 2) modeling the perturbation in the processed domain, 3) embedding the payload in the processed domain. We show that this methodology is easily tractable whenever the processes are known and enables to embed large and undetectable payloads. We also show that already used heuristics such as synchronization of embedding changes or detectability after rescaling can be respectively explained by operations such as color demosaicing and down-scaling kernels.

الوسائط المتعددة التشفير والأمن

Generative Steganography by Sampling

146 - Zhuo Zhang , Jia Liu , Yan Ke 2018

In this paper, a novel data-driven information hiding scheme called generative steganography by sampling (GSS) is proposed. Unlike in traditional modification-based steganography, in our method the stego image is directly sampled by a powerful genera tor: no explicit cover is used. Both parties share a secret key used for message embedding and extraction. The Jensen-Shannon divergence is introduced as a new criterion for evaluating the security of generative steganography. Based on these principles, we propose a simple practical generative steganography method that uses semantic image inpainting. The message is written in advance to an uncorrupted region that needs to be retained in the corrupted image. Then, the corrupted image with the secret message is fed into a Generator trained by a generative adversarial network (GAN) for semantic completion. Message loss and prior loss terms are proposed for penalizing message extraction error and unrealistic stego image. In our design, we first train a generator whose training target is the generation of new data samples from the same distribution as that of existing training data. Next, for the trained generator, backpropagation to the message and prior loss are introduced to optimize the coding of the input noise data for the generator. The presented experiments demonstrate the potential of the proposed framework based on both qualitative and quantitative evaluations of the generated stego images.

الوسائط المتعددة التشفير والأمن

A Neural Network-Based Linguistic Similarity Measure for Entrainment in Conversations

444 - Mingzhi Yu , Diane Litman , Shuang Ma 2021

Linguistic entrainment is a phenomenon where people tend to mimic each other in conversation. The core instrument to quantify entrainment is a linguistic similarity measure between conversational partners. Most of the current similarity measures are based on bag-of-words approaches that rely on linguistic markers, ignoring the overall language structure and dialogue context. To address this issue, we propose to use a neural network model to perform the similarity measure for entrainment. Our model is context-aware, and it further leverages a novel component to learn the shared high-level linguistic features across dialogues. We first investigate the effectiveness of our novel component. Then we use the model to perform similarity measure in a corpus-based entrainment analysis. We observe promising results for both evaluation tasks.

الحساب واللغة التعلم الآلي