On Attribution of Deepfakes


Abstract in English

Progress in generative modelling, especially generative adversarial networks, have made it possible to efficiently synthesize and alter media at scale. Malicious individuals now rely on these machine-generated media, or deepfakes, to manipulate social discourse. In order to ensure media authenticity, existing research is focused on deepfake detection. Yet, the adversarial nature of frameworks used for generative modeling suggests that progress towards detecting deepfakes will enable more realistic deepfake generation. Therefore, it comes at no surprise that developers of generative models are under the scrutiny of stakeholders dealing with misinformation campaigns. At the same time, generative models have a lot of positive applications. As such, there is a clear need to develop tools that ensure the transparent use of generative modeling, while minimizing the harm caused by malicious applications. Our technique optimizes over the source of entropy of each generative model to probabilistically attribute a deepfake to one of the models. We evaluate our method on the seminal example of face synthesis, demonstrating that our approach achieves 97.62% attribution accuracy, and is less sensitive to perturbations and adversarial examples. We discuss the ethical implications of our work, identify where our technique can be used, and highlight that a more meaningful legislative framework is required for a more transparent and ethical use of generative modeling. Finally, we argue that model developers should be capable of claiming plausible deniability and propose a second framework to do so -- this allows a model developer to produce evidence that they did not produce media that they are being accused of having produced.

Download